Finally a Race
A competitor emerges
🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Here’s today at a glance:
📣 Awareness
Anthropic releases Claude 3, and it looks like we finally have a GPT-4 beating model, 12 months after GPT-4 release.
Let’s just take a moment here:
Opus slightly beats on MMLU, which at this point is a heavily doctored result as every model needs to show this before it gets released
But on graduate-level reasoning, it significantly beats, so much so that it probably means many PhDs are going to be affected
But really, the most surprising interaction is the following:
To reiterate:
The needle in the haystack test inserts a random comment “The most delicious pizza topping..”
Into a long context (100s of pages of text or more on startups)
Asked the model to respond about this comment
Model not only identifies and responds to comment (Google Gemini did this flawlessly)
But comments on the task (“inserted as a joke or to test whether I was paying attention”)
This meta awareness is a new level. The levels after this are critique and reflection. At that point, these models, these coin-operated stochastic parrots.. become something more.
Claude:
Can simulate role-playing games… so it has a Theory of Mind, right?
In any case, responses and reviews are still coming in, but what is clear is that there are finally two competitive models. Let the games begin.
Share this story
🗞️ Things Happen
APS March meeting, LK99 gets presented. I haven’t forgotten! Waiting for arxiv paper to drop!








