Return to Session C4

Session C4: Positioning Technologies and Machine Learning

An ORB-Based SLAM Using Deep Learning for Dynamic Environments
Yiheng Zhao and Hongyang Yu, Research Institute of Electronic Science and Technology, University of Electronic Science and Technology of China(UESTC)

Real-time 3D reconstruction has been an important topic in the field of Simultaneous Localization and Mapping (SLAM). Currently, the indoor 3D reconstruction is primarily assumed to be static. As such, dynamic objects can interfere with the reconstruction process. Also, high volumes of dynamic objects can severely deteriorate the accuracy of reconstructed maps and result in ghost images on the final map. Therefore, effective elimination of dynamic objects in the reconstruction environment has become a key topic in SLAM research. Before the rise of deep-learning-based approach, traditional methods had several drawbacks: 1. Poor real-time performance, as the majority of solutions adopted an offline processing approach and they are unable to filter in real-time; 2. Mutually dependent relationship: the computation of camera's own motion model requires static landmarks in the environment, while the identification of static landmarks reversely needs the camera's own motion model, creating a mutually dependent relationship. 3. Difficulty in identifying potential dynamic objects, such as people standing still or cars parked at the roadside which can become dynamic at any moments. Deep-learning-based object detection models can effectively tackle the aforementioned challenges. For instance, the convolutional neural networks (CNNs), through stacked convolutional computations across multiple layers, can extract a variety of features at different scales, consequently accomplishing the object detection and classification. Compared to traditional methods based on geometric constraints, deep neural networks can categorize objects and identify potential dynamic targets with improved accuracy and efficiency. Therefore, by introducing appropriate deep learning models into the elimination of dynamic objects in the reconstruction scenario, the precision of the process can be effectively improved. This paper proposes a deep neural network and applies it to traditional SLAM algorithm models, achieving both high accuracy and mapping efficiency in the presence of dynamic object interference. The research is divided into two stages. The first stage involves training a deep neural network based on an open -source dynamic environment dataset to identify and eliminate dynamic objects in the RGB images. In the second stage, using a 3D reconstruction algorithm based on ORB-SLAM2, this paper establishes a real-time system operating in dynamic environments. By integrating the deep neural network with traditional SLAM, we obtained our basic prototype of SLAM. Subsequently, based on the feature map which eliminated dynamic object features, we achieved more reliable matching results and optimized the camera motion calculation and pose estimation between image frames, resulting in a more reliable local map. Finally, through steps like loop closure detection and alignment optimization, each key frame point cloud map is fused and adjusted to produce a visualized 3D map. By combining traditional SLAM with deep neural networks, we have established a better performing, more stable RGB-D SLAM.



Return to Session C4