LiDAR Data Enrichment Using Deep Learning Based on High-Resolution Image: An Approach to Achieve High-Performance LiDAR SLAM Using Low-cost LiDAR
Jiang Yue, Hong Kong Polytechnic University & Nanjing University of Science and Technology, China; Weisong Wen, Hong Kong Polytechnic University, China; Jing Han, Nanjing University of Science and Technology, China Li-Ta Hsu, Hong Kong Polytechnic University, China
Location: Galleria I/II
Alternate Number 1
LiDAR plays an irreplaceable role for the realization of Level 4 (L4) autonomous driving vehicles. It is not only used in the object detection but also in localization including LiDAR simultaneous localization and mapping (SLAM) and LiDAR map matching. The performance of LiDAR localization is satisfactory, but the price is not friendly. To make the L4 autonomous vehicles publicly accepted, the issue of the price of the vehicle comes to the first right after the safety and integrity of the vehicle technology. There are many researches aim to develop a low-cost LiDAR such as solid-state LiDAR (Poulton et al., 2017), optical phased arrays were developed on a silicon photonics platform. However, the range and speed are not enough, also the insertion loss of the laser power is a drawback. Another promising solution is MEMS-based LiDAR(Yoo et al., 2018). Currently, the robustness of the system still be an unsolved problem. This paper serves for the same purpose, but we are from the angle of software approaches. We aim to improve the performance of a 16 channel Velodyne LiDAR to the level of that of a 32 channel one. The core idea is to make use of the high-resolution images generate by the cameras that originally available from the vehicle to conduct depth completion of the low-cost LiDAR based on deep learning.
The data enrichment approach is well-known as super-resolution in the field computer vision (Yang, Yang, Davis, & Nistér, 2007) . This super-resolution ideal started with filter and draw much attentions, especially based on deep learning (Dong et al., 2016; Hui, Loy, & Tang, 2016). With the great success of deep learning, it is a common way to feed the sparse depth maps and high-resolution images to a neural network for training a model. Considering at the applications of autonomous driving such as lane keeping, normally, the lane width would be around 0.1 meter. Since the depth completion is an ill-posed problem, the neuronal network can achieve a reasonable result on the training dataset, but the predicted result still far from expectation. To deal with the insufficiency of conventional networks, a sparse convolution network which explicitly considers the location of missing data is proposed to realize super resolution on sparse depth, having mean absolute error (MAE) about 0.54 meter (Uhrig et al., 2017). The result is the baseline of depth completion on KITTI depth dataset, and the input is depth only. Currently, there are a lot of works estimating the depth from only a single-color image. With no doubts, deep learning has achieved remarkable results. Compared with traditional algorithms including Markov Random Field (MRF) and conjugate gradient, the new depth features employed in neuronal network have significantly reduced the MAE in the depth resolution. Surface normal used as a new depth local representation is proposed to predict the neighborhood pixel depth, it reduced the MAE of result about 0.226 meter (Qiu et al., 2019). Also, similar with the guided image filter, a pixel is a weighted average of nearby pixels. The weights are inferred from the image by neuronal network, and applied to sparse depth for a high dense depth map (Tang, Tian, Feng, Li, & Tan, 2019). The results show it reduced the MSE around 0.218m, currently, it ranks 1st at the KITTI leaderboard.
Different from traditional pattern recognition problems, the depth completion is more likely be a measurement problem. It belongs so-called ill-posed problem. Unfortunately, there are few available datasets to train the neuronal network. Most used KITTI depth completion dataset is about 86,000 training dataset. More critically, the traffic are heavy and the scene is highly dynamic such as the city like Hong Kong. It caused serious challenge for depth completion. This paper will show the poor result of the state of arts depth-completion methods on the KITTI headboard when it is tested by Hong Kong dataset. Compared with their outstanding result, 0.2 meter MSE, on KITTI test data, the MSE dropped to about 0.5m on our test dataset collected in the Hong Kong. Then, we show that based on the state-of-the-art depth completion method, the SLAM using 16 channel LiDAR can achieve a performance that is similar to that using 32 channel one. However, the LiDAR SLAM also strongly affected by the dynamic objects, which makes it not satisfactory for autonomous driving application. Finally, we conclude that there are two significant issues; highly dynamic road caused a lot of 1) occlusions and 2) scene changing, that affecting both LiDAR depth completion and LiDAR SLAM in traffic-dense areas. Our future work will focus on the development of a new LiDAR depth completion method to mitigate the above-mentioned issues.