2024-05-06: A GPT-4 Killer in the Wild
An unknown AI chatbot appeared, baffling experts with its advanced capabilities. Who's behind it, and what does it mean for the future of AI?
🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Here’s today at a glance:
🤔 A GPT-4 Killer in the Wild
The week began with a new and mysterious chatbot making an appearance on lmsys.org (Large Models Systems Organization), a blind taste-testing site for AI language models:
There were some who pointed out that this was still mediocre performance:
The model was extremely capable when tested by folks who had a lot of experience testing models:
Meanwhile, the capability increases were obvious
Breaking out of its training in context
The theory on why this happens, TLDR for a short prompt, pattern match to a memorized task, for a long prompt try to figure out what’s going on:
Better at code manipulation, as judged by a founder building a code generator
It was too good:
Perhaps because it had perfectly memorized the answers:
Theories abounded, was it reasoning and planning agent bolted onto the original, now open-sourced GPT2?
The pinnacle of technical discussion that is 4chan weighed in
Sam Altman played into the whole furor:
Even his edits were scrutinized, gpt2 or gpt-2?
The prompt was dug up
Getting chased down, lmsys clarified that a) it was a new model and b) it was secretly introduced for testing in partnership with the developer. Is lmsys getting paid for it? Unclear.
Attention intensified
And soon, the fix was in.
Bye-bye gpt2-chatbot, we hardly knew ye. lmsys updated its policies to disclose:
What have we learned from this?
There are monsters out there—undisclosed groups working on projects with high capability
Capability increases are easier than we thought - this was likely a small organization, given that large providers have ethics reviews prior to release, and one of the core underpinnings of AI safety is that humans have a right to know about the AI they are interacting with
Benchmarking organizations deserve greater scrutiny
Axios later (and uselessly) reported: “Speaking on Wednesday at Harvard University, Altman told an audience that the mystery bot is not GPT-4.5, what many see as the likely next major update to GPT-4.“
Ah, the classic confirming non-conformation.
Then, on Sunday,
At this point, who knows anymore? More drama shall follow





































