Elon Wuz Right
TikTok team proves Elon correct - images alone would be sufficient
Take a single photo, and get an understanding of the 3D positioning of the objects in the photo… better than LiDAR. That’s the promise of this work from TikTok, which was done during Lihe Yang’s PhD internship (!) at the company. In a tribute to Meta, titled Depth Anything.

Elon of course spotted this way early:
While no one believed Elon when he said Tesla didn’t need LiDAR or radar, and images alone would be sufficient, it seems like the TikTok team has proven him correct.
Features of this work:
Goal was to build a foundation model for depth estimation from a single image
Did not use the classical method of getting accurate ground truth measured depth maps to train the model on
Instead obtained a large (62 mil) image unlabelled dataset, which would form the basis of the “student” model
Then built an annotation model to label this dataset
Annotation model was built from a labeled 1.5 mil image dataset, the “teacher” model
This worked because of scale! They had many failures along the way
The exciting part of all of this is that it looks like vision alone is enough for a lot of tasks in the physical world.




