Globally-Referenced Electro-Optical SLAM for Collaborative Mapping and All-Weather Localization
Lakshay Narula, Michael Wooten, Matthew Murrian, Daniel LaChapelle, and Todd Humphreys, The University of Texas at Austin
Alternate Number 1
All major automakers today are engrossed in the development and integration of software and sensors that enable automated vehicles. Localization within a map is one of the primary operations that automated vehicles must perform, either to navigate from one location to another, or, more interestingly, to interact with their surroundings within a mapped environment. Prior high-definition digital maps allow the vehicle to expect the expected, that is, they relieve the system of the need to classify static features.
Satellite-based navigation sensors have historically been the unrivalled sensor of choice for localization. However, the high-reliability decimeter-level accuracy demanded by automated vehicles for lane-keeping and other applications has significantly changed this landscape. In fact, in most automated vehicles being developed, the GPS/GNSS system is a secondary sensor whose only role is to loosely constrain (within a few meters) the primary sensor data to a global reference when building a digital map. Other vehicular sensors such as visual cameras and LiDAR are being used as primary systems for vehicle localization within the prior map.
The need for accurate digital maps has spurred dedicated map-making campaigns involving fleets of specialized mapping vehicles. Mapping vehicles typically employ state-of-the-art high-performance sensors that are too expensive to be installed on consumer vehicles. Although these exquisite maps do enable sub-decimeter-accurate within-map localization, their construction and use comes with important limitations:
1. In current practice, sub-decimeter vehicle localization within a prior map is critically dependent on optical cameras and LiDAR. LiDAR is known to fail in heavy rain, snow, and fog. Optical cameras are vulnerable to poor lighting conditions and are easily blinded by bright light. Moreover, the previously-mapped roadside environment can be significantly altered by the build up of snow or sand. Thus, even after severe weather subsides, a vision- and LiDAR-dependent system may have difficulty locating the host vehicle within the prior map. Such weather-induced difficulties for within-map localization cannot be dismissed as negligibly rare: many populated regions of the globe are routinely subject to punishing weather.
2. While mapping using a specialized fleet is feasible for urban cities, it is time-consuming and cumbersome to map, and, more importantly, maintain the maps of entire countries/continents. A key enabler for large-scale up-to-date maps will be enlisting the help of the very vehicles who need the map – consumer vehicles – to build and update the map. However, consumer vehicles will only be equipped with low-cost consumer-grade sensor suites. The performance of such sensors in creating high-precision maps has not been explored.
3. Automated vehicles would also need accurate information on the position, velocity, and intent of their neighboring vehicles. While this information can be inferred using the sensors on the vehicle, in certain situations, for example at a blind corner or during a left turn manouver, it might be beneficial for the vehicles to communicate this information to each other on a wireless channel. For data exchange between vehicles, position and velocity information must be referenced to a common coordinate frame – preferably a global reference frame. However, the maps commonly generated using optical cameras and LiDAR are, as mentioned earlier, only loosely tied to the global frame (e.g., WGS-84), with an exact correspondence that differs from provider to provider: Waymo, Uber, and HERE would each assign different coordinates to the same physical object, and these coordinates could differ by a meter or more – far too large a discrepancy for coordinated automated driving.
Unlike optical cameras and LiDAR, GPS/GNSS is agnostic to weather elements, lighting conditions, etc. Thus, it is a natural complement to vision and LiDAR-based sensing. Its chief impairments, signal blockage and multipath, mostly occur in urban areas where vision and radar sensors can aid the mapping and localization solution. Also, as a globally-referenced source of absolute position and velocity, GPS/GNSS is very useful for sharing location and velocity data among vehicles. Radar is another sensor that works in all weather conditions. The radar measurements, while sufficiently accurate, may not be dense enough to effectively constrain a SLAM algorithm or to locate the vehicle in open areas where roadside features are scarce. Nonetheless, these measurements provide information that is useful for all-weather navigation. Surprisingly, radar has remained mostly unexplored for the purposes of mapping and localization [1,2].
This paper explores a sensor fusion scheme using GNSS, radar, and visible-light cameras for sub-30 centimeter accurate globally-referenced collaborative sparse mapping using low-cost sensors, and for sub-30 centimeter accurate globally-referenced localization in the resulting map. This fusion is termed GEOSLAM: Globally-Referenced Electro-Optical Simultaneous Localization and Mapping.
This paper explores two research problems, as detailed below:
1. Visual SLAM for Radar Mapping: GEOSLAM includes a sparse visual features-based SLAM system that performs visual odometry and windowed bundle-adjustment (WBA) with a stereo camera set-up to create a sparse feature point cloud of its surroundings. This visual SLAM pipeline performs comparably to top-ranking camera-based methods benchmarked on the KITTI dataset, such as ORB-SLAM2 . GEOSLAM is unique in that it ingests as measurements the globally-referenced GNSS solutions from a software-defined GNSS receiver, GRID, that has been developed at the UT Radionavigation Lab over the last decade. The integration of visual SLAM with GNSS estimates follows the method described in . The GNSS measurements (1) enable GEOSLAM to render globally-referenced sparse point clouds on days with favorable weather and lighting conditions, and, (2) reduce the drift of the visual SLAM solution with GNSS position measurements, as and when available. However, as mentioned earlier, these visual features-based maps may not be sufficient for localization in adverse weather conditions. To this end, GEOSLAM creates a globally-referenced map of radar targets. This is enabled by the range and angle measurements obtained from the radar in the body frame, and the global 6-degrees-of-freedom pose estimated by the GNSS-aided visual SLAM system on visually favorable days. On days with poor visual conditions, GEOSLAM leverages the existing radar map to localize the vehicle with sub-30 centimeters accuracy. This pipeline side-steps the problem of direct SLAM using radar measurements, that may not be constrained sufficiently by the sparse and two-dimensional (range and angle) measurements provided by the radar.
2. Accurate Collaborative Mapping: Standalone visual SLAM algorithms drift from the truth as a function of the distance traveled by the mapping agent, typically at a rate of about 1% per distance traveled. As mentioned above, this drift can be contained using GNSS position estimates, as and when available. However, the accuracy of the GNSS standard position service (SPS) estimates themselves is on the order of 1-3 meters. As a result, a single-session use of SPS is insufficient to create a map with sub-30 centimeters accuracy. One possible solution is to ingest carrier-phase differential GNSS (CD-GNSS) position measurements instead. However, this solution relies on the existence of a reference station for differencing the measurements. Meter-level errors in the SPS solutions are largely due to multipath and errors in modeling atmospheric effects. Over multiple runs through a given area, it is reasonable to expect that multipath errors in the SPS solution will average to zero, owing to the changing satellite geometry at each session. The errors in modeling ionospheric and tropospheric delay would also ideally be zero-mean. Thus, this paper explores the possibility of sub-30 centimeter accurate collaborative visual SLAM with multiple sessions contributed by collaborating agents with access to SPS position measurements.
The sensor suite used in this project consists of two GNSS antennas connected to a custom front-end developed in house, two visible-light cameras, and a 77 GHz FMCW radar unit. The sensors are rigidly mounted to a platform that can be strapped on to any vehicle. The GNSS system on board also provides CD-GNSS solutions that can be used as ground truth in light-urban settings.
The envisioned project would produce (1) a novel WBA-based algorithm that fuses camera images with GNSS SPS positions and tracked radar targets to produce an always-available and highly reliable vehicle pose, (2) a sparse globally-referenced visual and radar point cloud map of a section of Austin as a by-product of the WBA-based algorithm, and (3) a prototype hardware platform featuring low-cost sensors.
 F. Schuster, C. G. Keller, M. Rapp, M. Haueis, and C. Curio, “Landmark based radar slam using graph optimization,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016, pp. 2559–2564.
 F. Schuster, M. Worner, C. G. Keller, M. Haueis, and C. Curio, “Robust localization based on radar signal clustering,” in Intelligent Vehicles Symposium (IV), 2016 IEEE. IEEE, 2016, pp. 839–844.
 R. Mur-Artal and J. D. Tardos, “ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras,” arXiv preprint arXiv:1610.06475, 2016.
 D. P. Shepard and T. E. Humphreys, “High-precision globally-referenced position and attitude via a fusion of visual SLAM, carrier-phase-based GPS, and inertial measurements,” in Proceedings of the IEEE/ION PLANS Meeting, May 2014.