Previous Abstract Return to Session F2 Next Abstract

Session F2: Advanced Software and Hardware Technologies for GNSS Receivers

Multimodal Learning for Reliable Interference Classification in GNSS Signals
Tobias Brieger, Friedrich-Alexander-University (FAU), Fraunhofer Institute for Integrated Circuits (IIS); Nisha Lakshmana Raichur, Dorsaf Jdidi, Felix Ott, IIS; Tobias Feigl, FAU/IIS; J. Rossouw van der Merwe, Alexander Rügamer, Wolfgang Felber, IIS
Date/Time: Wednesday, Sep. 21, 2:35 p.m.

Objectives
Interference signals affect the processing chain of the Global Navigation Satellite System (GNSS) and so, degrade its localization accuracy. Therefore, potential interference signals must be mitigated or a potential transmitter (i.e., jammer) eliminated. However, to successfully remove the interference signals, they must first be detected and then localized. In addition, the successful classification of the waveform of an interference signal helps to deduce the signal’s purpose, thus simplifying its localization. Recently, snapshot-based data-driven methods, such as Support Vector Machines (SVMs) and Convolutional Neural Networks (CNNs) [10], outperformed classic model-based techniques, such as pattern recognition and mathematical formulations, as they enable high classification accuracy even under challenging scenarios. One of the reasons for this is that they can learn a mapping approximation directly from data that implicitly describe non-deterministic or nonlinear functions without any additional modeling effort. However, these classical learning-based methods either do not use time and context at all, e.g., Random Forest [2], SVM [11], and CNN [4], only consider time-dependent local phenomena (i.e., spatial features) in snapshots, e.g., ResNet [5] and Temporal CNN (TCN) [14], or only consider time-dependent global phenomena (i.e., time-sensitive features) in sequences but ignore local phenomena, e.g., Recurrent Neural Networks (RNN). To improve the mitigation of interference signals and incorporate both spatial and time-sensitive features, we propose a novel Multimodal Learning (MML) system that improves the classification accuracy beyond the state-of-the-art, considers the uncertainty of its estimates and lowers the computation and energy costs significantly. To this end, our MML approach complements the most prominent methods from the literature that claims to provide the most robust and accurate classification to enable multimodal embedding of inputs. Our experiments show that our MML framework with late fusion mechanics learns to implicitly weigh between spatial (images of spectrum data) and time-sensitive (matrix of raw IQ samples) features, as it provides reliable interference classification. We use realistic, deterministic data from our large-scale real measurement campaign covering five sources of interference signals and multipath effects to evaluate state-of-the-art methods and our MML framework. To the best of our knowledge, we are the first to investigate data-driven multimodal fusion methods on real-world data to create energy-efficient multipath-resistant classification algorithms that may adapt to various types of input and artifacts thereof.
Methodology
By fusing the raw data of a typical mid-range sensor (at a low-sampling rate) with characteristics of a low-cost sensor (at a high-sampling rate), the accuracy and robustness of our MML classifier are improved. Our MML aims to extract meaningful information from both modalities and uses the combined feature representation to estimate accurate classification results. In general, fusion may be achieved at the decision level (i.e., late fusion) or intermediate fusion. Although experiments in neuroscience [8] and Machine Learning (ML) [12] suggest that mid-level fusion may promote learning, late fusion is still the predominant method used for multimodal learning [1, 9]. Therefore, we evaluate both fusion techniques concerning our MML method: (1) the late fusion method, which fuses high-level features from respective sources before we use them for classification; and (2) the intermediate fusion, which fuses the features at intermediate levels. To do so, we use the Multimodal Transfer Module (MMTM) [7], which can be added at different levels of the feature hierarchy to enable slow modality fusion. At each fusion level, we further investigate the importance of attention-based fusion [3], which models feature selection for robust sensor fusion.
To enable our proposed solution, we use a multi-stage framework (preprocessing of features, their fusion, classification, and uncertainty estimation): For a typical low-cost hardware platform, we sample (low-rate) 20 ms raw IQ samples every second using a Software Defined Radio (SDR) with a bandwidth of 50 MHz. Our pre-processing also performs a Fast Fourier Transform (FFT) to transform the raw data (spatial features) into the frequency domain. There is consequently an information gap of about 980 ms between each data frame due to processing delays. We fill this gap by (high-rate) sampling of abstract characteristics, such as carrier-to-noise density ratio (C/N0) or automatic gain control (AGC) values (of the same sensor or even another sensor), or by interpolation of the IQ values (temporal features). We feed the low-rate snapshots of spatial features to our MML estimator, a ResNet18 [6], as a waterfall diagram (i.e., images). In contrast, we feed the high-rate sampled or interpolated time-series data to our MML estimator, a TS-Transformer [13]. We process time-series and waterfall image data separately and perform a late or intermediate fusion of the features from ResNet and TS-Transformer to perform the final classification. Monte Carlo dropout is applied to the fused layers to assess the uncertainty of each estimate. In a final step, we calculate the overall accuracy using a fully connected layer and the SoftMax function. We use renowned scores (e.g., F??=2 score) to evaluate the performance and efficiency of our MML framework.
We use realistic data recorded in a deterministic test environment, to benchmark our methods. We use different sensors with different sampling rates, resolutions, and different levels of signal abstraction (from Android smartphones, over low-cost consumer-grade SDR sensors to high-end geodetic-grade sensors). For classification, we use five main classes of interference, namely: continuous wave- (CW), chirp-, noise-, impulse-, and broadband-interference, which are typically very difficult to detect or classify as their signal characteristics in the time and frequency domain with and without interference are very similar.
Anticipated Results
To evaluate our MML framework, we use the well-known pessimistic F-?-score to weight missed interference more heavily in the total error. In contrast to the state-of-the-art, we also evaluate the inference time and computational costs of all prominent methods independently and end to end (concerning our framework). By comparing the inference performance of the complete system with data from two different sensors and data from a single sensor, we show the costs and benefits of different variants. To our knowledge, we are the first to investigate the reliability of the method and its uncertainty concerning multipath effects on GNSS and interference signals in constrained and complex environments. As opposed to limiting the evaluating to ideally controlled laboratory evaluations.
Our first results show multi-class classification using our CNN architecture with over 93% f1 accuracy on real data with noisy signals and multipath propagation. Therefore, we expect that our MML with renowned methods (ResNet and TS-Transformer) achieves a significantly more accurate performance than individual methods without MML. In addition, we assume that using multimodal data from one or two different sensors will further improve performance and robustness. Furthermore, we expect that due to the high-rate time-series component (TS-Transformer), we can even predict interference or classify or estimate it in real-time.
In contrast to previous studies [10], which used ML and Deep Learning (DL) methods for interference analysis, our framework is tested on synthetic data from laboratory simulations and in a deterministic real environment. We evaluate all our methods using realistic data collected in the Fraunhofer test center in Nuremberg. The data contains signals with and without interference, with different distances between transmitter and receiver, with different signal strengths, different movement dynamics, and different multipath propagations. Our indoor scenario also attenuates signals to limit interference with nearby real-time GNSS receivers. The GNSS signals are relayed via a repeater with a receiver mounted on the roof of the test center and a ceiling transmitter to compensate for the 20 to 50 dB attenuation caused by the reinforced concrete building. Of course, we compare our results with the most modern approaches to interference classification on our realistic data.
Conclusions
The fundamental idea of using both spatial and time-sensitive features in our data-driven fusion framework allows us to reliably identify and categorize various interference signals even in multipath situations.
Our experiments show that even with a low sampling rate (1 snapshot per second) of the waterfall diagrams, our MML algorithm enables very accurate classification, thus reducing the computational effort and improving efficiency.
Our MML classifier improves interference classification even in complex multipath environments (e.g., in tunnels or houses) and reduces computational costs as it implicitly extracts local phenomena from the image classifier and extracts time series features from a single GNSS sensor.
Significance
A multipath-resistant classification algorithm is crucial for real-world applications and could help find and mitigate any sources of interference.
In addition, we are the first to propose the multimodal fusion of complementary information in GNSS data with data-driven methods. Our approach allows extracting local spatial information from images and their temporal relationships in a targeted manner, without complex modeling.
Furthermore, our approach may predict sources of interference using the temporal method, allow longer gaps in the computationally expensive image data, and enables a significant reduction in computation and energy costs.
References
[1] Mahdi Abavisani, Hamid Reza Vaezi Joze, and Vishal M Patel. Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1165–1174, 2019.
[2] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
[3] Changhao Chen, Stefano Rosa, Yishu Miao, Chris Xiaoxuan Lu, Wei Wu, Andrew Markham, and Niki Trigoni. Selective sensor fusion for neural visual-inertial odometry. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10542–10551, 2019.
[4] Kunihiko Fukushima and Hayaru Shouno. Deep convolutional network neocognitron: improved interpolating-vector. In Intl. Joint Conf. on Neural Networks (IJCNN), pages 1–8. IEEE, 2015.
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015.
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
[7] Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L Iuzzolino, and Kazuhito Koishida. MMTM: Multimodal transfer module for CNN fusion. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pages 13289–13299, 2020.
[8] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. Large-scale video classification with convolutional neural networks. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1725–1732, 2014.



Previous Abstract Return to Session F2 Next Abstract