Panoramic Perception of Aircraft Taxiing Guidance Based on Onboard Cameras
Shuguang Zhang, Hongxia Wang, Kun Fang, Beihang University; Kelin Zhong, Commercial Aircraft Corporation of China; Zhipeng Wang, Hongwu Liu, Beihang University
Location: Beacon A
In recent years, the rapid development of the civil aviation industry has brought unprecedented opportunities and challenges. The continuous increase in air traffic has placed airports, as the starting point of the aviation transportation system, under growing operational and management pressure, raising higher requirements for airport management and operations. Against this backdrop, the taxi guidance system at airports has become increasingly important. Currently, due to delays in infrastructure development, taxi guidance at most airports still relies on voice communication between controllers and flight crews, which can easily lead to guidance errors, especially in busy operations, low-visibility conditions, or when ground markings are unclear, thereby increasing the risk of runway incursions. Moreover, while taxiway positioning typically depends on GNSS services, not all airports can consistently receive GNSS signals, particularly those in remote locations or with inadequate infrastructure, where weather, atmospheric disturbances, and electromagnetic interference can affect positioning accuracy and taxiing safety. In recent years, there have been several global incidents of runway incursions caused by taxi guidance errors, significantly impacting airport safety management.
In 2003, the International Civil Aviation Organization (ICAO) proposed the Advanced Surface Movement Guidance and Control System (A-SMGCS). The successful application of this system at some major airports, such as Daxing Airport, has demonstrated that continuously improving automation levels can effectively alleviate operational management challenges at airports. Simultaneously, there is a growing demand for unmanned technology in the airside operations of civil airports. The European Union Aviation Safety Agency (EASA) has begun advocating for the gradual integration of AI-based automated decision-making systems, supervised by humans, into the existing air traffic control system to enhance the intelligence of taxi guidance.
Therefore, there is a need for an auxiliary taxi guidance system that does not rely on external GNSS signals, enabling aircraft to autonomously move between the gate or apron and the runway under human supervision. Onboard vision cameras, which closely resemble human perceptual behavior, do not require any modifications to the onboard equipment and are already being used to verify ATC broadcast errors and for real-time monitoring of surface obstacles.
However, the power and computational resources available in the rear cabin of the aircraft are very limited, making it extremely challenging to perform multiple visual detection tasks simultaneously in real-world scenarios. Numerous network architectures have been developed to meet individual needs, such as the YOLO series for object detection, PSPNet for semantic segmentation, and ADNet for guidance line recognition. However, processing tasks sequentially often takes more time than handling them concurrently. YOLOP is a feedforward network designed to handle complex autonomous driving tasks, using a shared encoder and three decoders to manage different tasks without introducing complex and redundant shared modules between the decoders, allowing for efficient end-to-end training.
In terms of visual taxi guidance, Quentin et al. proposed a sliding-window-based taxi guidance solution that utilizes visual information from a camera mounted under the aircraft’s nose and airport map data. However, the limited field of view of the nose camera and the error accumulation in the sliding window pose challenges. Additionally, although taxiway designs, markings, and signs have been standardized, traditional edge detection algorithms struggle to accurately extract the shape of the guidance lines in conditions of high wear, low visibility, slippery surfaces, and light snow.
Therefore, we propose an improved YOLOP multi-task network that uses onboard forward-looking cameras to simultaneously perform guidance line extraction, drivable area segmentation, and obstacle detection, serving as a vision-based auxiliary system for airport surface taxi guidance. Additionally, to facilitate future research and evaluation of similar tasks, we have created a video dataset of aircraft taxiing from the main cockpit perspective based on onboard forward-looking cameras. Our work primarily includes the following contributions:
Firstly, we created a small dataset of forward-view perspectives for aircraft taxi guidance to support onboard vision applications based on deep learning. Due to the scarcity of existing datasets in airport environments and the limitations of annotations, especially for guidance lines and runway-specific objects, we developed a new dataset that covers simulated environments at several major airports in China under various weather conditions. The dataset includes 100 videos in 2K resolution, which were post-processed to 720p resolution, and consists of 4,000 training images, 500 validation images, and 500 test images. The focus is on object detection, runway line marking, and drivable area segmentation. The images include a range of weather and visibility conditions, providing bounding box annotations for three categories: "aircraft," "vehicles," and "pedestrians," as well as detailed guidance line markings to ensure the accuracy of positioning and trajectory planning for taxi guidance tasks. The design of this dataset aims to improve the generalization ability of taxi guidance systems under various conditions, laying a foundation for future research and applications.
Considering the unique challenges posed by the higher forward-facing view from onboard cameras and the complex curvature and numerous intersections of airport taxi guidance lines, we made modifications to the YOLOP model. The Graph Convolutional Network (GCN) is a deep learning model specifically designed for processing graph-structured data by extending convolution operations to graphs, enabling learning and propagation of node features. Its core idea is to leverage the adjacency information of the graph, aggregating the features of each node with those of its neighboring nodes to capture relationships between nodes. By incorporating the GCN module, we can effectively handle complex spatial relationships, enhancing the model's ability to recognize curved and intersecting guidance lines and better capture the geometric shapes and curve features of the lines. Additionally, the model integrates the Spatial and Channel Reconstructing Convolution (SCConv) module in place of standard convolution. The SCConv module refines features through the Spatial Reconstruction Unit (SRU) and Channel Reconstruction Unit (CRU), ensuring that the model efficiently extracts key information while processing high-dimensional features, reduces feature redundancy, and lowers computational costs, all while maintaining model performance.
Lastly, we validated the model's performance in a laboratory environment and deployed the model on a low-power system to evaluate its accuracy and usability under different conditions, including tests for guidance line extraction, guidance line tracking, generalization ability, and computational complexity. We compared the test results with the edge-detection-based work of Quentin et al. The performance evaluation showed that the success rate for line recognition exceeded 80%, and the model demonstrated better generalization in line tracking. Even on an embedded computing platform, the model achieved a performance of over 15 FPS, highlighting the significant potential of camera vision for implementing aircraft-assisted taxi guidance.