Sora Roundup
The reviews of the leap forward in video models
š· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Hereās today at a glance:
š Sora Roundup
On Jerk Day, Thursday, Feb 15th, OpenAI disclosed their text to video model: Sora, and the world moved forward again. It was not completely unexpected, as many, many teams across the industry were working on individual aspects of video. But still⦠it was a great leap forward. It is very hard to generate any video for more than 2 seconds, let alone up to one minute, without any weird morph artifacts and missing and disappearing features.
Comparisons
Capabilities
I just want to take a moment to explore the capabilities of the Sora model, It shows
Clear signs of having been trained on the output of a 3D engine
It can generate multiple videos in the same āworldā at the same time. This means that eventually, you can just imagine a scene from every possible angle, without needing cameras everywhere.
Sequential scene changes in the same story world
Storytelling
Worryingly realistic-looking humans
Sora allows video-to-video editing
Same Data Source
The comparisons between Sora and Midjourney revealed that they seemed to have been trained on the same data. When we dream in latent space, we have similar dreams.
In effect, the similarity in training data causes convergence to the same district of latent space. Another example below:
We Donāt Know How To Do This
Meanwhile, Yann LeCun, Facebookās AI chief, declared in the Middle East just days prior that generative AI would never reach this milestone:
Yann was out and about on Twitter defending his statements, and to be honest, he may still be right in the end, but still, the juxtaposition is a tad embarrassing.
In any case, there was an incredible amount of cope among real-world animators.
Though everyone should know better at this point
Build Alpha
The best information on the Sora build came from the co-author of the underlying paper, Saining Xie:
He goes on to speculate that Sora might only be a 3 billion parameter model, which implies:
not that many GPUs utilized for generation
fast inference
cheap
lots more runway to improve
and quickly
There are real questions on how closely Sora is simulating reality, with some converting Sora video into 3D scrollable representations known as radiance fields:
OpenAIās first intern, Dr. Jim Fan, was roundly shouted down, but persisted in that Sora must be performing both world and physics modeling,
Poor Google
Meanwhile, poor Google achieved 5-second videos in late January and has still not released the model to the public. Compare:
The final Sora rundown
Leveraging spacetime patches, Sora offers a unified representation for large-scale training across various durations, resolutions, and aspect ratios.
It generates high-definition content, showcasing its prowess in handling videos and images with dynamic aspect ratios.
It excels in framing and composition, outperforming traditional square-cropped training methods.
Utilizing descriptive video captions, Sora achieves higher text fidelity, making it adept at following detailed user prompts for video generation.
From animating static images to extending videos, Sora showcases a wide range of editing capabilities.
Sora's training reveals emergent properties like 3D consistency and long-range coherence, hinting at its potential as a simulator for the physical and digital world.
All Known Soras
A supercut of all known and confirmed Sora videos with their associated prompts.
šļø Things Happen
Legendary chip architect Jim Keller responds to Sam Altman's plan to raise $7 trillion to make AI chips ā 'I can do it for less than $1 trillion'. Everyone is targeting chips at this point
Geoffrey Hinton: 200,000 people a year die of incorrect medical diagnoses in the United States. AI will fix that in the next 10 years.



















