VO-Sim: A Generic Framework for Tuning and Evaluating Visual Odometry Systems
Islam Alaa and Amr Wassal, Cairo University, Egypt
Visual odometry (VO) is gaining a huge popularity in many disciplines and applications; all the way from smart driving, robotics, surveillance, search and rescue missions, among many others. This emerging popularity is due to the availability of high-performance low-cost cameras and the impressive advances in the accuracy and precision of different VO algorithms. Two of the main issues designers of such algorithms confront are the tuning of different systems parameters as well as the testing and evaluation of their algorithms across a wide range of configurations.
Very little work is available in the literature to address the issue of parameter tuning for visual odometry systems and to study the effect of each tuning parameter on the overall performance of the system. Moreover, the literature lacks a generic framework that can be used to test and compare different parameter sets with respect to system accuracy and run-time.
In one paper addressing such problem, the authors presented a simulator for stereo VO systems that was built in order to evaluate the performance of the system with respect to different motion estimation algorithms and the number of key points gathered from each frame. The feature extractor as well as the feature matching algorithm used for motion estimation were neither mentioned nor evaluated in this work. This simulator included basic steps of stereo VO without any pre or post-processing steps like bundle adjustments, estimation, or fusion. In another paper, the effect of using four different features, namely: SIFT, SURF, ORB, and A-KAZE, in monocular VO was extensively studied. Depth recovery techniques were used to extract the depth information, and a state update step was added to the pipeline to further enhance the algorithm stability. Each feature was tested 48 times and results were averaged in order to test the stability and the reliability of each feature extractor. The performance distribution across different runs for each feature was reported in order to present the stability of the feature across different runs in the same conditions. However, this work was limited to features extractors and their effect on the performance and was not extended to other factors and parameters present in the monocular VO pipeline. In addition to that, these previously mentioned contributions did not provide a flexible framework for researchers to experiment with.
Generally, Conventional Monocular Visual Odometry (CMVO) systems consist of consecutive blocks that mutually contribute to the overall systems performance in terms of both complexity and accuracy. Tuning of a single parameter individually does not imply or guarantee a direct enhancement in performance. On the contrary, tuning each parameter separately disregarding its effect on other blocks in the system may degrade the performance heavily in terms of both accuracy and execution time aspects. Consequently, a framework is needed in order to properly study the mutual impact of different parameter on the system performance. This framework should also facilitate rapid prototyping of new blocks in CMVO, and should provide a more systematic and robust methodology for selecting and tuning the proper parameter set to achieve the optimum performance.
In this paper, we propose a highly controllable and observable CMVO framework to enable the tuning of different parameter sets, and to monitor not only the overall performance metrics but also intrinsic performance metrics for internal CMVO blocks.
This framework is built to support running two CMVO engines at the same time in a round robin manner with two separate parameter sets with the ability to be extended to more than two. Each of the two CMVO engines has the generic basic blocks of CMVO including: image acquisition, feature detection, sparse feature matching using optical flow, essential matrix estimation, decomposition of rotation and translation matrices, and accumulation of estimated poses. They also have two separate wrappers at the beginning of the pipeline as placeholders for user-defined pre-processing procedures, as well as at the end of the pipeline as placeholder for any user-defined post-processing procedures. Moreover, scale information is assumed to be provided by an external source or an external user-defined block. In addition to that, a dataset manager is present to interface with dataset files and supply proper images to the two CMVO engines in the proposed framework. Finally, a scoreboard is updated every iteration with different performance metrics measured during operation to provide an online monitor for the performance as well as a final detailed performance report. The scoreboard also features a live screen to monitor the trajectories estimated by the two CMVOs vs. ground-truth in 2D.
The need to design a generic extendable framework and a simulator for VO engines as well as different design considerations to account for, are discussed in detail. Moreover, we also present a qualitative overview and discussion of different efforts exerted to define this framework and address this problem. The paper also discusses different available options for extending the functionality of this framework to cover a wider range of VO engines and algorithms briefly. Finally, the saved tuning time and effort are estimated and compared with traditional tuning methods in order to prove the effectiveness of the framework.