Indoor and Outdoor evaluation of Visual-Inertial Localization Algorithms
Alvika Gautam, IIIT Delhi, India; Subodh Mishra and Srikanth Saripalli, Texas A&M University, USA
Alternate Number 1
Robot localization is one of the most challenging problems in mobile robotics. For applications like autonomous navigation, tracking and obstacle avoidance, the mobile robot must have a correct estimate of its current pose with respect to a fixed frame. This estimate must be robust, reliable and must have a high update rates to overcome sudden changes and maintain correct controller functionality. While there are numerous sensors to cater to this, cameras prove to be the least expensive solution. Visual Odometry is the process of estimating the pose (translation and orientation relative to the initial location over time) of a body using a single or multiple cameras. Traditional VO pipelines first detect features on each frame and track them over the next frame, this follows a motion estimation step using 2D-2D, 3D-3D or 3D-2D feature correspondences and ultimately local optimization is done to obtain the camera pose (for example, using bundle adjustment). While monocular VO is simpler to implement and requires lesser computations than Stereo VO, it suffers from scale ambiguity and requires initial acceleration to bootstrap. Stereo VO on the other hand, does not have the aforementioned issues, but it is computationally intensive. If an IMU is added to the VO system, it is commonly referred to as Visual Inertial Odometry. For monocular systems, addition of an IMU removes the scale ambiguity and for stereo systems it helps to obtain a smoother trajectory with lesser drifts and abrupt changes, because IMUs have higher update rates than cameras. In this work, we review the two recent Visual Inertial Odometry frameworks from literature and characterize their performance. The selected frameworks are VINS-Mono and PIRVS. We plan to demonstrate the advantage and disadvantage of each one of them and study the effect of change in various parameters on the performance of the system when integrated with aerial robots and the autonomous ground rovers.
2. Overview of VINS-Mono and PIRVS
VINS Mono is an open-sourced Monocular Visual Inertial Odometry Algorithm. The basic steps involved in VINS-Mono are measurement preprocessing, estimator initialization and tightly
coupled monocular visual inertial odometry. Next, we discuss the basic flow of VINS-Mono pipeline. The measurement preprocessing step involves a vision processing front-end and a concurrently running IMU pre-integration step. In the vision processing front-end, for each new incoming image frame, old features are tracked and the new features are detected. Keyframes are also selected in this step based on some selection criteria(s). For state update, first IMU pre-integration is done, because it is computationally less demanding than re-propagating each IMU measurement. Estimator initialization is of paramount importance because the scale is not observable with monocular vision which makes it difficult to fuse IMU and visual measurements without the knowledge of initial state. A visual SLAM based algorithm is used to bootstrap the VIO algorithm. The system is given an initial acceleration and as it accelerates and the camera view changes, this visual SLAM algorithm is used to estimate the up-to scale system trajectory. The alignement of the visual-SLAM generated trajectory with the IMU generated trajectory helps to estimate the scale. Once the scale, the initial values and the biases are
known, a tightly coupled non nonlinear based optimization method is used to fuse the visual and inertial measurements. A Visual-Inertial Bundle Adjustment(BA) formulation is employed to minimize the sum of prior and the Mahalanobis norm of all measurement residuals to obtain a maximum posteriori estimation. The frequency of the VIO described here is limited by the image capture frequency. Besides these basic modules there is a Relocalization and a Global Pose Graph Optimization as well.
Perceptin robotics vision system (PIRVS) is a stereo camera based visual inertial hardware setup which uses stereo visual inertial odometry algorithm for pose estimation. The device has a dual-core ARM-Cortex A72 and a quad-core ARM Cortex A53 SoC, with GPU support provided by ARM Mali-T860MP4. It has a 2 GB RAM and 8 GB internal storage. There are three major components of the stereo visual inertial odometry algorithm, the image processing front-end, an EKF based tightly coupled VIO module and a mapping module. The image processing front-end detects corner features from an image, performs histogram equalization and uses an optical flow algorithm to track features in new incoming frames. New features are tracked when the number of tracked features falls below a certain threshold. The
features selection criteria ensures that no two features are too close to each other. Finally, a binary descriptor is used to describe each of the features. Each detected feature is associated a
3D point in the map. The tightly coupled stereo VIO algorithm uses an iterated Extended Kalman Filter(EKF) frame-work. When a new IMU measurement is received, the state is propagated/predicted by kine-
matic equations and the EKF update is done when measurements from the visual data is received. A mapping module runs in parallel to the VIO algorithm. The map consists of a set of 3D map points and a set of keyframes. A frame is chosen as a keyframe if the pose associated with it has a heading very distinct from all the keyframes in the database or if the distance to any keyframe is larger than a threshold. New map points are generated from spatial and temporal correspondences. Temporal correspondences are obtained from the image-processing front end and the spatial correspondences are obtained from the stereo images. Finally, each set of corresponding 2D features is triangulated into a 3D map point. The keyframe pose and the 3D locations of the features are refined through a standard bundle adjustment approach.
While VINS-Mono and PIRVS are inherently different systems (one being a monocular and the other being a stereo visual inertial odometry setup), performance comparisons can be drawn based on the methodologies followed in their respective Visual Interial Odometry/SLAM algorithm pipeline. As mentioned in the previous sections, though VINS has a highly optimized monocular setup with robust initialization which is able to recover the system from unknown initial states, PIRVS being a stereo setup has the capability of starting from rest without requiring any initial excitation. In addition to this, PIRVS claims to have an optimal SLAM system since the hardware and slam algorithms are collaboratively designed. The main advantage here is the synchronization between IMU and camera. Both VINS and PIRVS have tightly coupled visual-inertial sensor fusion but PIRVS is more scalable with the ability to include loosely coupled additional sensor modalities. Another significant difference in the two localization frameworks is that PIRVS doesn’t perform an explicit loop closure unlike VINS-Mono and instead a local map is maintained. Despite the absence of loop closure, PIRVS claims to perform better than VINS-Mono in terms of absolute trajectory error, due to a more robust filter design that enables tracking in even challenging environments.
We performed some preliminary tests for both VINS-Mono and PIRVS. Some of the observations are as follows:
a. Performance of VINS is highly sensitive to the initial movements provided in the initial calibration procedure, where as PIRVS has a more robust offline camera-imu extrinsics calibration procedure.
b. PIRVS detects and operates on more clearly visible corner features unlike a denser feature map given by VINS, which might adversely affect the performance of PIRVS in areas with less prominent
An exhaustive evaluation of the two frameworks is required in different environment setups to test their performance for autonomous applications like path planning , target tracking, autonomous landing etc. Some of our planned comparison approaches are discussed in the next section.
3. Proposed Methodology and Experiments
We plan to test the performance of VINS-MONO and PIRVS on both aerial and ground vehicles. Performance evaluation for aerial robots will be in both indoor and outdoor environments, whereas for ground rovers it will be outdoors under different environmental conditions and time of the day Hardware setup for the testing VINS-Mono includes a pixracer autopilot and a monocular oCam global shutter camera. We plan to use the camera in downward facing configuration for all the tests. Broadly some of the test cases to be evaluated are as follows:
a. Effect on performance of the algorithms under varying frame IMU data rates.
b. Performance comparison under varying imaging, lighting conditions and effect of shadow both indoors and outdoors.
c. Effect on performance due to a significant change in the lighting conditions like moving indoors from outdoors and vice versa.
d. Performance of the algorithms and ability of loop close detection for varying mission duration.
e. Pose drift in both the cases under stationary conditions and accuracy of pose estimation.
Dataset collection will be crucial part of these tests and for consistence sake, we also plan to collect the image and IMU data from camera-IMU synchronized PIRVS hardware and evaluate the performance of VINS-MONO on the same.
4. Preliminary Tests and Results
The video URL shows a preliminary result of data runs done using (pixracer autopilot and oCam global shutter camera) and PIRVS hardware sensor suite. The data collected using pixracer and oCam was collected in a downward facing camera configuration. This dataset was used as an input to VINS-Mono and rviz visualisation of the trajectory can be seen in the video (left).
In the preliminary data collection, PIRVS wasn't able to find many features in downward facing configuration, which is why it was tested using a forward facing camera configuration and the trajectory plotted in MATLAB is shown in the video (right). In these respective configurations, it can be seen that PIRVS performs better than VINS-Mono in terms of trajectory and loop. We plan to perform further indoor and outdoor tests to accurately validate the working of the two in both forward & downward facing configurations.
N.B. The left and right videos are not synchronized as the data was not collected simulataneously, but it was collected over the same rectangular trajectory for both the sensor suites.