Enhancing Accuracy in Visual SLAM by Tightly Coupling Sparse Ranging Measurements Between Two Rovers
Chen Zhu, Technische Universität München, Germany; Gabriele Giorgi, German Aerospace Center (DLR), Germany; Young-Hee Lee, Technische Universität München, Germany; Christoph Günther, Technische Universität München and DLR, Germany
Compared with stand-alone rovers, multiple cooperative robots equipped with cameras enable a more efficient exploration of the environment, a better accessibility in partially obstructed areas to anchor points, e.g., signals from base stations or satellites, and are more robust against malfunctions of an individual platform. VSLAM (Visual Simultaneous Localization and Mapping) techniques have been developed in recent years to estimate the trajectory of vehicles and to simultaneously reconstruct the map of the surroundings using visual clues. This work proposes a tight coupling sensor fusion approach based on the combined use of stereo cameras and sparse ranging measurements between a minimal set-up of two dynamic rovers in planar motion. Applying the proposed approach, the relative position and attitude between the two rovers can be estimated. Furthermore, the VSLAM accuracy can be improved exploiting the range information. The Cramer-Rao lower bound (CRLB) of the pose estimator using the fusion algorithm is calculated, and the method is shown to outperform the one based on the vision-only approach in the simulation.
1.Objectives and Structure of the Work
Autonomous robotic platforms can be utilized in the exploration of extreme environments, e.g., extraterrestrial exploration or in disaster areas. The autonomous navigation of the robots often relies on several sensors such as mobile radio receivers, Inertial Measurement Units (IMUs), laser scanners and cameras. VSLAM (Visual Simultaneous Localization and Mapping) techniques using stereo camera rigs have been developed in recent years to estimate the trajectory of vehicles and to simultaneously reconstruct the map of the environment. In order to increase the system robustness against hazards inherent to the missions (e.g., the rover gets stuck in complicated terrains), and to improve the exploration efficiency, it is proposed to use a robotic swarm including multiple autonomous units such as ground rovers. In order to maintain a consistent global map, the relative pose between the rovers requires to be estimated accurately. Furthermore, the drift in ego-motion estimation can be potentially mitigated by applying the cooperative information exchange.
In such situation, several multi-agent cooperative VSLAM approaches have been devised. However, all the methods are based on the merging of images or maps, which requires overlapping areas of the maps and significant amounts of data transmission. By establishing a wireless radio link between two rovers, ranging measurements can be obtained using pilot signals and round-trip-delay (RTD) estimation techniques. The additional information can be used to improve the exploration based on VSLAM techniques. Using the method we proposed earlier in , the relative pose between the two rovers can be estimated by using cameras and sparse ranging measurements, without transmitting any image or feature point via the link. Nevertheless, the method is based on a loose coupling concept, which does not exploit the range measurements to improve the visual SLAM accuracy. Therefore, we propose in this work a sensor fusion method that exploits both the ranging measurements and the stereo camera images, and shows to what extent the rover pose estimation can be improved.
The organization of the paper is as follows: first of all, we define the system model and give a brief introduction of stereo-camera-based VSLAM. In the following section, the Cramer-Rao lower bound is calculated for VSLAM in planar motion based on stereo cameras. Subsequently, a sensor fusion method is proposed, which exploits a ranging link between two dynamic rovers. Then, the simulation results are provided and conclusions are drawn from the analysis.
2.Sensor Fusion using Stereo Cameras and Sparse Ranging Measurements
The system composed of two rovers arbitrarily moving in a plane. Two rovers, equipped with a stereo camera rig and a wireless radio receiver, execute SLAM tasks on the ground. The motion of both vehicles is constrained to be planar. We define a navigation frame (N) as a fixed coordinate frame with its origin at the starting location of the rover. The navigation frame of each rover is related to the world reference frame by a specific transformation dependent on the initial position and attitude of the vehicles.
By applying feature detectors, several feature points can be matched between the stereo images and tracked over frames for a period of time. To start the motion estimation, using the matched visual features in both images of a stereo camera rig, the depth d can be retrieved and the three-dimensional (3D) location of the point can be estimated. Given enough tracked features, the pose of the vehicle at the next time instant can be estimated by minimizing the reprojection error. Using the estimated pose, the 3D position of the new features detected in the new frame can be updated. As a result, the tracking can be continued as long as sufficient features can be tracked in consecutive frames.
Since the obtained motion estimates follow dead-reckoning principle, the estimation error will accumulate over time. In order to improve the accuracy of the estimation result, a global optimization for both 3D point position and the vehicle poses, i.e., bundle adjustment, is performed using K keyframes and Np map points. Consequently, by executing the tracking and optimization, each rover obtains a set of egomotion estimates expressed in its own navigation frame.
Applying the loose coupling method we proposed in , the relative pose parameters can be estimated by exploiting range measurements, which can be regarded as coarse solutions of the rovers pose before the tight-coupled integration of both vision and ranging information. Using the coarse estimates as initial values, the integrated estimation of the rovers' pose is obtained by solving the optimization derived from maximum-likelihood estimator. The CRLB of the estimator can be obtained by calculating the corresponding Fisher information matrix. The fusion algorithm does not require any common field-of-view for the two stereo rigs, making the proposed approach more flexible and efficient in exploration tasks.
A representative trajectory is generated to evaluate the performance of the sensor fusion method and the CRLB in a simulation scenario. We set 10000 feature points which are distributed randomly in the 3D space. The stereo rigs' intrinsic parameters and sensor model are provided by a real camera, a PointGrey Bumblebee2. The image sensor has a resolution of 1024*768 pixels, with pixel density 213.33 [pixels/mm]. The baseline length between the left and right camera is 12 [cm]. The 2D features are generated by using perspective projection with visibility check. White noise is added on both the 2D feature locations and the simulated range measurements.
The CRLB of the position estimation is shown in plots with respect to the ranging accuracy, represented by standard deviation of the ranging noise. It can be shown from the plot that when the ranging noise is small, the CRLB of the fusion-based method is much lower than the vision-only approach. When the ranging noise level is high, the fusion algorithm's accuracy converge to the vision-only method.
On the other hand, the relation between the CRLB and the feature location accuracy is illustrated in this work. Since the baseline length of the stereo rig is only 12 [cm] and the resolution is not considerably high, the performance of the vision-only approach degrades significantly for feature location noise with standard deviation greater than 1 pixel. On the other hand, the bound for the fusion algorithm is much lower with the aid of the ranging measurements.
It is also shown from the simulation that by applying the proposed sensor fusion method, the position estimation outperforms the vision only approach with different simulation parameters.
In VSLAM-based exploration applications, using multiple cooperative rovers can improve the efficiency and robustness. We propose a tight coupling fusion algorithm to improve the SLAM accuracy by exploiting range measurements between two rovers. The fusion approach is shown to outperform the state-of-the-art vision-only method in both theoretical analysis using CRLB and simulations.
 C. Zhu, G. Giorgi, and C. Günther, “Scale and 2d relative pose estimation of two rovers using monocular cameras and range measurements,” in Proceedings of the 29th International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS+ 2016), Portland, Oregon. Institute of Navigation, 2016, pp. 794–800.