If you see a self-driving car out in the wild, you might notice a giant spinning cylinder on top of its roof. That’s a lidar sensor, and it works by sending out pulses of infrared light and measuring the time it takes for them to bounce off objects. This creates a map of 3D points that serve as a snapshot of the car’s surroundings.
One downside of lidar is that its 3D data is immense and computationally intensive. A typical 64-channel sensor, for example, produces more than 2 million points per second. Due to the additional spatial dimension, the state-of-the-art 3D models require 14x more computation at inference time compared to their 2D image counterparts. This means that, in order to navigate effectively, engineers first typically have to collapse the data into 2D—the side effect of this is that it introduces significant information loss.
But a team from MIT has been working on a self-driving system that uses machine learning so that custom hand-tuning isn’t needed. Their new end-to-end framework can navigate autonomously using only raw 3D point cloud data and low-resolution GPS maps, similar to those available on smartphones today.
End-to-end learning from raw lidar data is a computationally-intensive process since it involves giving the computer huge amounts of rich sensory information for learning how to steer. Because of this, the team had to actually design new deep learning components which leveraged modern GPU hardware more efficiently in order to control the vehicle in real-time.
“We’ve optimized our solution from both algorithm and system perspectives, achieving a cumulative speedup of roughly 9x compared to existing 3D lidar approaches,” says Ph.D. student Zhijian Liu, who was the co-lead author on this paper alongside Alexander Amini.
In tests, the researchers showed that their system reduced how often a human driver had to take control over from the machine, and could even withstand severe sensor failures.
For example, picture yourself driving through a tunnel and then emerging into the sunlight—for a split-second, your eyes will likely have problems seeing because of the glare. A similar problem arises with the cameras in self-driving cars, as well as with the systems’ lidar sensors when weather conditions are poor.
To handle this, the MIT team’s system can estimate how certain it is about any given prediction, and can therefore give more or less weight to that prediction in making its decisions. (In the case of emerging from a tunnel, it would essentially disregard any prediction that should not be trusted due to inaccurate sensor data.)
The team calls their approach “hybrid evidential fusion,” because it fuses the different control predictions together to arrive at its motion-planning choices.
“By fusing the control predictions according to the model’s uncertainty, the system can adapt to unexpected events,” says MIT professor Daniela Rus, one of the senior authors on the paper.
In many respects, the system itself is a fusion of three previous MIT projects:
- MapLite, a hand-tuned framework for driving without high-definition 3D maps
- “variational end-to-end navigation,” a machine learning system that is trained using human driving data to learn how to navigate from scratch
- SPVNAS, an efficient 3D deep learning solution that optimizes neural architecture and inference library
“We’ve taken the benefits of a mapless driving approach and combined it with end-to-end machine learning so that we don’t need expert programmers to tune the system by hand,” says Amini.
As a next step, the team plans to continue to scale their system to increasing amounts of complexity in the real world, including adverse weather conditions and dynamic interaction with other vehicles.