Return to Session E3a
Next Abstract
### Session E3a: All-Source Intelligent PNT Method

Return to Session E3a Next Abstract

**Reinforcement Learning Framework for Robust Navigation in GNSS Receivers**

*David Contreras Franco, Iñigo Cortés, Georgios Kontes, Tobias Feigl, Christopher Mutschler, and Alexander Rügamer, Fraunhofer IIS*

**Date/Time:** Thursday, Sep. 19, 8:35 a.m.

Peer Reviewed

Robust synchronization of the code phase, carrier Doppler, and carrier phase is essential in global navigation satellite system (GNSS) receivers to achieve a continuous and reliable position, velocity, and time (PVT) solution (Morton et al., 2020). The acquisition and tracking stages represent the synchronization phase of a GNSS receiver. The acquisition provides a rough estimate of the code phase and frequency Doppler, whereas tracking refines the code phase, frequency Doppler, and carrier

phase synchronization parameters (Kaplan & Hegarty, 2006; Teunissen & Montenbruck, 2017).

The tracking stage contains a tracking channel per GNSS signal. Each tracking channel consists of three tracking loops that perform the synchronization of the GNSS signal’s parameters: delay-locked loop (DLL), frequency-locked loop (FLL), and phase-locked loop (PLL). The configuration parameters of each tracking loop, which determine the tracking performance against noise and signal dynamics, are the discriminator type, the loop bandwidth, the integration time, the order, and the correlator spacing (Jwo, 2001; Gardner, 2005).

A fixed configuration of the tracking loops compromises the tracking performance under time-varying scenarios characterized by varying noise levels, dynamics, and fading effects. For instance, a tracking architecture with narrow loop bandwidth and long coherent integration (LCI) improves tracking sensitivity (Pany et al., 2009; Gowdayyanadoddi et al., 2015; Xie & Petovello, 2015) at the cost of being susceptible to signal dynamics. On the contrary, a wide loop bandwidth and a short integration time enhance tracking robustness against high dynamic signals, degrading the tracking sensitivity at the same time (Cortés et al., 2022b). Therefore, a fixed configuration of the tracking loop is a sub-optimal solution for time-varying scenarios.

Recent research presented an adaptive control algorithm to adjust the response time of feedback control systems based on the noise and dynamic information from the system (Cortés et al., 2019). A particular implementation of this algorithm is the loop-bandwidth control algorithm (LBCA): an adaptive tracking technique that updates the loop bandwidth based on a weighted difference between the mean and the standard deviation of the discriminator’s output (Cortés et al., 2020). The LBCA consists of four main steps. First, the dynamics and noise information of the innovation are estimated. In a tracking loop, the innovation is represented as the discriminator’s output. The features represent the dynamic and noise information, respectively. The LBCA can consider the absolute mean and the standard deviation as features. Second, the features are weighted based on the current loop bandwidth. The weighting functions are a linear combination of positive, continuous, and increasing functions. In previous research, the shape of the weighting functions is determined based on a model-based approach. Third, the weighted features are combined, resulting in a control value. Finally, the control value updates the current loop bandwidth. In previous implementations of the LBCA, a Schmitt trigger is implemented to avoid possible noise instabilities during the bandwidth update. The LBCA has been extensively tested and applied to more advanced tracking schemes (Cortés et al., 2021, 2022a,b). The results confirm the superior performance of the LBCA-based adaptive tracking architecture compared to state-of-the-art adaptive tracking techniques while maintaining low complexity.

The primary limitation of the adaptive control algorithm lies in the need for precise tuning to select suitable features and weighting functions. Better features capable of discriminating between dynamics and noise events more effectively may exist. Additionally, the shaping of the weighting function has been determined based on a hand-crafted model approach derived from the three-sigma rule of thumb (Cortés et al., 2021). However, this model only accounts for the innovation’s steady-state error dynamics and thermal noise, neglecting transient dynamics or Allan variance from the oscillator.

Accurately selecting features and weighting functions is crucial for enhancing the tracking loop’s performance. This paper proposes a reinforcement learning (RL) framework to search for the optimal weighting functions based on selected features.

Machine learning (ML) is a hot topic in GNSS, demonstrating outstanding performance across various applications (Siemuri et al., 2022). However, ML comes with several design and application considerations. First, determining the most suitable ML model type and its structure, and training the model can be a laborious task, especially for complex models (Goodfellow et al., 2016). Second, ML models usually do not extrapolate/generalize well far from the training data, which limits the reusability of a model when applied to new datasets and environments. Third, ML is not a silver bullet that can significantly boost performance in all possible cases. Existing, well-established algorithms and classical models might already provide good performance. Hence, ML models would not need to be trained and deployed.

In scenarios without domain knowledge, a fully data-driven approach becomes a viable solution despite the substantial training effort it demands. However, when a model is well-established, opting for a model-based approach is often the most efficient choice, eliminating the need for ML.

An intriguing scenario arises when there is some domain knowledge, and approximate models can be applied, although they are not optimal. In such cases, ML can be effectively employed, and the training complexity decreases due to the availability of these approximate models. This approach strikes a balance between leveraging domain knowledge and harnessing the power of ML. The LBCA presented shows an adequate balance between domain knowledge and ML. The features and the relation between weighting functions are already predefined by the domain knowledge, whereas the search for the weighting functions

can be done using ML.

Determining the ideal bandwidth for each sample is not feasible. Therefore, supervised learning may not be the appropriate approach. Among the existing ML techniques, RL is the one that best suits this architecture. Most of the RL techniques that have been implemented in GNSS are focused on the navigation engine (Tang et al., 2023; Barzegar & Lee, 2022; Li et al., 2022; Gao et al., 2020; Dasgupta et al., 2022). However, from the author’s knowledge, there has not been any research contribution about RL techniques to adapt the response time of the tracking loops in a GNSS receiver. This paper introduces an RL framework to optimize the adaptive control algorithm. In this framework, the environment represents the tracking channels and the navigation engine, while the agent includes the adaptive control algorithm. To simplify, consider the optimization of the adaptive control algorithm, specifically targeting the update of the PLL’s response time. The state vector contains the PLL’s discriminator output and the PLL’s loop bandwidth of each tracking channel. The action corresponds to the updated loop bandwidth of each channel. As for the reward function guiding the optimization process, the phase-lock indicator (PLI) is utilized.

There is a plethora of reinforcement learning algorithms addressing different types of problems (Arulkumaran et al., 2017). Considering the nature of the problem at hand, the policy search/gradient algorithm type has been selected (Deisenroth et al., 2013). The presented adaptive control algorithm approach can be integrated into the policy function. The REINFORCE algorithm (Williams, 2004) is used to optimize the parameters of the policy function.

Two setups are prepared to perform and verify the RL training of the LBCA. First, for a first analysis of the RL, an RL framework is implemented in Python to train the LBCA using synthetic signals. Second, a test automation using a radio-frequency constellation simulator (RFCS), a GOOSE ©receiver, and a GPU is performed. The RFCS GSS9000 creates the static and dynamic simulated scenarios and outputs the radio-frequency signal to the GOOSE© single board computer (SBC). The GOOSE© platform, developed by Fraunhofer IIS and marketed through TeleOrbit GmbH, is a GNSS receiver with an open software interface (Overbeck et al., 2015; Seybold, 2020). The open software interface gives the possibility to implement the adaptive ultra-tight integration architecture and to conduct a deep analysis of the tracking performance. A user computer controls the RFCS GSS9000 and the GOOSE© SBC through transmission control protocol (TCP) and performs the test automation. The information on the tracking loops of the GOOSE receiver is sent to the GPU. Then, the training of the LBCA is performed by the GPU, which sends the updated bandwidth to the GOOSE receiver. Finally, the GOOSE receiver updates the bandwidth of the selected tracking loops.

This research presents four main contributions. First, it is demonstrated that the LBCA can serve as the policy function of a reinforcement learning structure. Second, a method to search for the probability density function of the policy function is proposed. Third, it is proven that the LBCA is trainable; the best weighting functions in the LBCA can be searched. Fourth, a proof-of-concept of an RL framework in tracking is presented. From the author’s knowledge, this is the first time RL has been implemented in the tracking stage of a GNSS receiver.

References:

Arulkumaran, K., Deisenroth, M. P., Brundage, M., and Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38.

Barzegar, A. and Lee, D.-J. (2022). Deep reinforcement learning-based adaptive controller for trajectory tracking and altitude control of an aerial robot. Applied Sciences, 12(9).

Cortés, I., Conde, N., Van der Merwe, J. R., Simona Lohan, E., Nurmi, J., and Felber, W. (2022a). Low-complexity adaptive direct-state Kalman filter for robust GNSS carrier tracking. In 2021 International Conference on Localization and GNSS (ICL-GNSS).

Cortés, I., R ?ugamer, A., van der Merwe, J. R., Overbeck, M., and Strobel, C. (2019). Adaptive weighting matrix for adaptive tracking loops. EU Patent, EP3816672.

Cortés, I., Urquijo, S., Overbeck, M., Felber, W., Agrotis, L., Mayer, V., Schonemann, E., and Enderle, W. (2022b). Robust tracking strategy for modern GNSS receivers in sounding rockets. In ESA Workshop on Satellite Navigation User Equipment Technologies (NAVITEC).

Cortés, I., van der Merwe, J. R., Nurmi, J., Rügamer, A., and Felber, W. (2021). Evaluation of adaptive loop-bandwidth tracking techniques in GNSS receivers. Sensors, 21(2).

Cortés, I., Van der Merwe, J. R., Rügamer, A., and Felber, W. (2020). Adaptive loop-bandwidth control algorithm for scalar tracking loops. In Proceedings of IEEE/ION PLANS.

Dasgupta, S., Ghosh, T., and Rahman, M. (2022). A reinforcement learning approach for global navigation satellite system spoofing attack detection in autonomous vehicles. Transportation Research Record, 2676(12):318–330.

Deisenroth, M. P., Neumann, G., Peters, J., et al. (2013). A survey on policy search for robotics. Foundations and Trends® in Robotics, 2(1–2):1–142.

Gao, X., Luo, H., Ning, B., Zhao, F., Bao, L., Gong, Y., Xiao, Y., and Jiang, J. (2020). Rl-AKF: An adaptive Kalman filter navigation algorithm based on reinforcement learning for ground vehicles. Remote Sensing, 12(11).

Gardner, F. M. (2005). Phaselock Techniques. Wiley, 3 edition.

Glynn, P. W. and L’Ecuyer, P. (1995). Likelihood ratio gradient estimation for stochastic recursions. Advances in Applied Probability, 27(4):1019–1053.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT Press.

Gowdayyanadoddi, N., Curran, J. T., Broumandan, A., and Lachapelle, G. (2015). A ray-tracing technique to characterize GPS multipath in the frequency domain. International Journal of Navigation and Observation, 2015:1–16.

Jwo, D.-J. (2001). Optimisation and sensitivity analysis of GPS receiver tracking loops in dynamic environments. In IEE Proc.- Radar, Sonar and Navigation, pages 241 – 250.

Kaplan, E. D. and Hegarty, C. J. (2006). Understanding GPS: Principles and Applications. Artech House mobile communications series. Artech House, 2 edition.

Li, X., Tang, X., Wang, X., Tan, W., Zheng, J., and Zhao, Y. (2022). Application of reinforcement learning in sins/GNSS/dvl integrated navigation. In 2022 International Conference on Machine Learning, Cloud Computing and Intelligent Mining (MLCCIM), pages 346–352.

Morton, Y. J., Yang, R., and Breitsch, B. (2020). GNSS Receiver Signal Tracking, chapter 15, pages 339–375. John Wiley & Sons, Ltd.

Overbeck, M., Garzia, F., Popugaev, A., Kurz, O., Forster, F., Felber, W., Ayaz, A. S., Ko, S., and Eissfeller, B. (2015). GOOSE - GNSS receiver with an open software interface. In Proceedings of the 28th International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS 2015).

Pany, T., Riedl, B., Winkel, J., Woerz, T., Schweikert, R., Niedermeier, H., Lagrasta, S., Lopez-Risueno, G., and Jim ?enez-Ba ?nos, D. (2009).

Coherent integration time: The longer, the better. Inside GNSS, 4:52–61.

Seybold, J. (2020). GOOSE: Open GNSS Receiver Platform. Technical report, TeleOrbit GmbH. Available Online: https://teleorbit.eu/en/satnav/.

Siemuri, A., Selvan, K., Kuusniemi, H., Valisuo, P., and Elmusrati, M. S. (2022). A systematic review of machine learning techniques for GNSS use cases. IEEE Transactions on Aerospace and Electronic Systems, 58(6):5043–5077.

Tang, J., Chen, X., Li, Z., Zhao, H., Xie, S., Xie, K., Kuzin, V., and Li, B. (2023). Log-regularized dictionary learning-based reinforcement learning algorithm for GNSS positioning correction. IEEE Internet of Things Journal, pages 1–1.

Teunissen, P. J. and Montenbruck, O. (2017). Springer Handbook of Global Navigation Satellite Systems. Springer, 1 edition.

Van Der Merwe, J. R., Franco, D. C., Feigl, T., and R ?ugamer, A. (2024). Optimal machine learning and signal processing synergies for low-resource GNSS interference classification. IEEE Transactions on Aerospace and Electronic Systems, pages 1–18.

Williams, R. J. (2004). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256.

Xie, P. and Petovello, M. G. (2015). Measuring GNSS multipath distributions in urban canyon environments. IEEE Transactions on Instrumentation and Measurement, 64(2):366–377.

Return to Session E3a Next Abstract

For Attendees * *Technical Program * *Registration * *CGSIC * *Hotel * *Travel and Visas * *Smartphone Decimeter Challenge * *Exhibits * *Submit Kepler Nomination For Authors and Chairs * *Abstract Management * *Author Resource Center * *Session Chair Resources * *Panel Moderator Resources * *Student Paper Awards * *Editorial Review Policies * *Publication Ethics Policies For Exhibitors * *Exhibitor Resource Center * *Marketing Resources Other Years * *Future Meetings * *Past Meetings