Return to Session C4 Next Abstract

Session C4: Vision-based Navigation Systems

A GNSS-based Monocular Visual-Inertial Odometry Method for Large-scale Scene Positioning
Zhijun He, Hongbo Zhao, Wenquan Feng, Beihang University, China
Location: Atrium Ballroom
Alternate Number 5

With the rapid development of artificial intelligence and pattern recognition, mobile robot technology is gradually applied in more and more fields, including military, service, medical, rescue, space exploration and other fields. Considering that all these scenes are non-structural and changeful, the mobile robot need to be able to perceive, move and plan autonomously. Among all the technologies related to mobile robot, autonomous navigation is generally considered as the most important point. Recent years, approaches that use only a monocular camera have gained significant interests due to their small size, low-cost, and easy hardware use. However, monocular vision-only systems are very limited and imprecise especially in some large-scale scenes such as urban and campus environments. The cumulative error is hard to be eliminated and the systems are incapable of recovering the metric scale. Therefore, there is a growing trend of using a low-cost inertial measurement unit(IMU) to assist the monocular vision system. The traditional way to deal with visual and inertial positioning is loosely-coupled sensor fusion, where IMU unit is only used to assist vision-only pose estimation. Such fusion is usually based on extended Kalman filter.
Nowadays, most used method is tightly-coupled visual-inertial algorithms, which are either based on EKF or optimization. Camera and IMU measurements are jointly optimized from the raw measurement level. And most of the current researches could perform well for indoor SLAM but not very ideal for large scale outdoor scenario. Thus, this paper puts forward a GNSS-based monocular visual-inertial odometry method to deal with such large-scale scene. We used Shi-Tomas method to extract the feature points of adjacent frame image. In addition, in order to effectively minimize the time cost, we used KLT (Kanade-Lucas-Tomasi) optical flow method to track the feature points between two adjacent frames and figure out the current robot pose matrix. Then we conducted a tightly-coupled calculation of visual and IMU measurements. Based on BA (Bundle Adjustment) method and marginalization, we could get the estimation of robot move. Generally, loopback detection should be the next step to minimize the cumulative error. But in large-scale scenario, loopback detection needs huge computational cost and thus lower the real-time performance of the system. In this paper, GNSS data were used to correct the cumulative error instead of loopback detection. We conducted a loosely-coupled calculation of the results and GNSS measurements. In this step, GNSS measurements were fused with robot pose estimation data at a certain weight. The details are as follows.
1) Use Shi-Tomas feature point method and KLT optical flow method to achieve the feature extraction and tracking of adjacent frame image.
2) In order to estimate the camera pose from the data in step1, BA method was used to solve such PnP problem. For given 3D points and their projection position, we can estimate the visual camera pose by using 2D-3D PnP method.
3) Collecting the data from IMU unit. We give a discussion about IMU error model and dynamitic model. Then the IMU camera pose can be figured out by IMU pre-integration method.
4) Initializing the system. This step aims to figure out some parameters and initial value that are necessary for system optimization, including absolute measure, gravity acceleration, position and velocity of former key frames, etc. Besides, noticing that the sampling rate of IMU is much higher than camera frame rate, thus another significant purpose of this step is to justify the estimation data of IMU and camera.
5) Conducting the tightly-coupled estimator of visual and IMU measurements. This step used sliding window non-linear estimator to deal with feature point coordinates, camera pose, and IMU parameters. Since the IMU parameters were bought into calculation, we need to consider both visual re-projection error and IMU error while defining the cost function. Besides, as the system running, the amount of feature points and pose estimation data will be more and more and thus the computation amount increases. So at the end of this step, marginalization was executed to handle such problem.
6) After step 5, we got the position, velocity, yaw angle of visual-inertial system. Considering that our system are designed for outdoor large-scale scenario, we can use GNSS data to eliminate the cumulative error. We use a GPS/BD dual-mode receiver module to collect the position, velocity data and conduct a loosely-coupled equation. The calculation weight are based on corresponding variance results.
Our work is to construct an on-line VIO system for large scale outdoor scenario. Preliminary results show that GNSS measurements could effectively reduce the cumulative error. Our experiment runs on ROS platform with an i7-8700 CPU. GNSS data are exported in 1Hz frequency. We will conduct our experiments in well-lighted and poor-lighted campus ground respectively. And some preliminary results show that the system could perform better with the GNSS module especially in dark environment.
Our researches can also be suitable for UAV (Unmanned Aerial Vehicle) system which would be part of our future work.



Return to Session C4 Next Abstract