Tesla’s progress with artificial intelligence and neural nets has propelled its Autopilot and Full Self Driving solutions to the front of the pack. This is the result of the brilliant work of a large team of Autopilot directors and staff, including Tesla’s Senior Director of AI, Andrej Karpathy. Karpathy presented Tesla’s methods for training its AI at the Scaled ML Conference in February. Along the way, he shared specific insights into Tesla’s methods for achieving the accuracy of traditional laser-based lidar with just a handful of cameras.
The secret sauce in Tesla’s ever-evolving solution is not the cameras themselves, but rather the advanced processing and neural nets they have built to make sense of the wide range and quality of inputs. One new technique Tesla’s AI team has built is called pseudo-lidar. It blends the lines between traditional computer vision and the powerful point map world of lidar.
Traditional lidar-based systems rely on an array of lidar hardware to provide an unparalleled view of the world around the vehicle. These systems leverage invisible lasers or similar tech to send a massive number of pings out into the world to detect surrounding objects.
The result is a realtime visualization of what the world around a vehicle looks like based on the distance of each laser point. The computer translates the points into a 3D representation and is able to identify other vehicles, humans, roads, buildings, and the like as a means of enabling the vehicle to navigate in that world more safely.
In recent years, the push towards autonomous driving has resulted in a massive surge in the development of lidar units themselves, and the supporting software solutions that use them. Even so, the cost of lidar systems continues to be prohibitive, with single sensors costing thousands of dollars each. Cameras, on the other hand, only cost a few dollars each, thanks to their prevalence in smartphones, laptops, and the like.
Tesla’s camera-based approach is much cheaper and easier to implement on the hardware side, but requires an insanely complex computer system to translate raw camera inputs and vehicle telematics into intelligence. At a foundational level, the computer can identify lane markings, signs, and other vehicles from a series of sequential static images, also known as a video.
Tesla is taking computer vision to unprecedented levels, analyzing not just images, but individual pixels within the image. “We take a pseudo-lidar approach where you basically predict the depth for every single pixel and you can cast out your pixels,” Karpathy said. Doing this over time replicates much of the functionality of a traditional lidar system, but requires a massive amount of realtime processing power for the image deconstructions to be of any use.
Vehicles are driven in realtime, so it doesn’t do any good to have a system that can make determinations or predictions based on an image if it the results are not available instantaneously. Thankfully, Tesla built its own hardware for the third major version of its autonomous driving computer and it was purpose-built to run Tesla’s code.