Stereo Vision-Based Simultaneous Localization and Mapping with Ranging Aid
Young-Hee Lee, Technical University Munich, Germany / Chen Zhu, Technical University Munich, Germany / Gabriele Giorgi, German Aerospace Center (DLR), Germany / Christoph Guenther, Technical University Munich and DLR, Germany
Stereo vision-based Simultaneous Localization and Mapping (SLAM) is a viable tool for unmanned ground vehicles (rovers) to explore unknown and GPS-denied environments. Unlike monocular vision, it is possible to determine rover poses and build a map database without scale ambiguity by using stereo vision. However, incremental errors are unavoidable since vision-only SLAM is a dead-reckoning process in which current state is determined by using previous pose.
Loop closing is a technique generally applied to resolve and mitigate error accumulation. In this technique, matches of map points on a current scene are continuously searched in a database (loop detection) and when matches are detected, drifts are corrected (loop correction). By applying this method, errors accumulated along trajectories can be effectively compensated. However, it is challenging to detect accurate and robust loop closures and correct the loops without introducing significant computational complexity.
Furthermore, in order to operate any loop closing technique, a rover has to revisit previously observed places. This is an intrinsic problem of loop closure which results in constraining the mission planning.
We propose a different approach to resolve the drift problem in vision-based SLAM: fusion of ranging measurements with stereo vision. Ranging sensors mounted on a rover and located at a static base station provide distance measurements. The distance measurements from the ranging sensors are absolute values which are independent from the previously determined states, thus enabling compensation for drifts and more accurate estimation of the rover’s location without closing a loop.
We present a simple graphical representation for keyframe-based stereo vision SLAM with ranging aid. Camera poses represented by the Lie algebra of rigid transformation in the 3D space (se3) and positions of map points represented by locations in the 3D Euclidean space are set as nodes (state variables) of the graph. A stereo camera gives feature points on the left and right images as visual measurements of the corresponding map points, and ranging sensors provide scalar measurements of the distances between the static base station and a rover. We assume that the location of the static base station is known, thus not a state variable. From the graph, we formulate a global bundle adjustment problem with a cost function given by the summation of re-projection and ranging squared errors over all defined edges with an appropriate weight between vision and ranging measurements. In addition, the Huber kernel is applied to robustify the cost function. We apply the Levenberg-Marquart algorithm for the optimization.
We evaluated the proposed algorithm by post-processing of the estimation results of vision-only SLAM. For a real stereo image dataset, sequence 07 of the KITTI dataset is used and ranging measurements are emulated by using the ground true poses provided by the dataset with superimposed Gaussian white noise. The results of the trajectories and the map points qualitatively show that the proposed algorithm gives a more accurate solution. We also compared the local accuracy by Relative Pose Errors (RPE) and global consistency by Absolute Trajectory Errors (ATE). These results quantitatively show that the proposed algorithm compensates the states drifts of the dead reckoning vision-only SLAM, moreover, it is comparable with loop closures.