2024-05-06: A GPT-4 Killer in the Wild

An unknown AI chatbot appeared, baffling experts with its advanced capabilities. Who's behind it, and what does it mean for the future of AI?

Prakash

May 05, 2024

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

A GPT-4 killer in the wild
AI artwork of the day

🤔 A GPT-4 Killer in the Wild

The week began with a new and mysterious chatbot making an appearance on lmsys.org (Large Models Systems Organization), a blind taste-testing site for AI language models:

andrew gao@itsandrewgao

uh.... gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot the IMO is insanely hard. only the FOUR best math students in the USA get to compete prompt + its thoughts 🧵

andrew gao @itsandrewgao

🧵megathread of speculations on "gpt2-chatbot": tuned for agentic capabilities? some of my thoughts, some from reddit, some from other tweeters my early impression is 👇

9:20 PM · Apr 29, 2024 · 780K Views

65 Replies · 155 Reposts · 1.25K Likes

There were some who pointed out that this was still mediocre performance:

Hieu Pham@hyhieu226

Well... two problems: (1) SIX best math students in the USA get to compete. (2) If I were an IMO judge, the solution would receive a 3 out of 7. A stricter judge might give a 2. A more generous judge might give a 4, but I would protest anything more than that. Context:

andrew gao @itsandrewgao

10:43 PM · Apr 29, 2024 · 76.9K Views

4 Replies · 13 Reposts · 115 Likes

The model was extremely capable when tested by folks who had a lot of experience testing models:

Ethan Mollick@emollick

There is a mysterious new model called gpt2-chatbot accessible from a major LLM benchmarking site. No one knows who made it or what it is, but I have been playing with it a little and it appears to be in the same rough ability level as GPT-4. A mysterious GPT-4 class model? Neat!

4:57 PM · Apr 29, 2024 · 580K Views

88 Replies · 200 Reposts · 1.31K Likes

Meanwhile, the capability increases were obvious

Angel ❄️@Angaisb_

I asked GPT-4 Turbo and gpt2-chatbot to make a game using JS in a single HTML document. These are the results: The first one is 4 Turbo, the second one is gpt2

11:26 PM · Apr 29, 2024 · 114K Views

14 Replies · 28 Reposts · 340 Likes

Breaking out of its training in context

Dimitris Papailiopoulos@DimitrisPapail

I found one task that gpt2-chatbot is better than all other models, and it's completely useless. Early but rapid ascent on the A+B-1 question by @Kangwook_Lee

5:11 PM · Apr 29, 2024 · 65.9K Views

9 Replies · 11 Reposts · 188 Likes

The theory on why this happens, TLDR for a short prompt, pattern match to a memorized task, for a long prompt try to figure out what’s going on:

Kangwook Lee@Kangwook_Lee

🧵Let me explain why the early ascent phenomenon occurs🔥 We must first understand that in-context learning exhibits two distinct modes. When given samples from a novel task, the model actually learns the pattern from the examples. We call this mode the "task learning" mode.

5:28 PM · Mar 12, 2024 · 72.4K Views

11 Replies · 64 Reposts · 291 Likes

Better at code manipulation, as judged by a founder building a code generator

Chase@ChaseMc67

Can confirm gpt2-chatbot is definitely better at complex code manipulation tasks than Claude Opus or the latest GPT4 Did better on all the coding prompts we use to test new models The vibes are deffs there 👀

5:55 PM · Apr 29, 2024 · 165K Views

10 Replies · 11 Reposts · 257 Likes

It was too good:

Sully@SullyOmarr

Gpt2 drawing unicorns vs Claude opus Whatever this model is, its really good.

6:20 PM · Apr 29, 2024 · 332K Views

39 Replies · 39 Reposts · 714 Likes

Perhaps because it had perfectly memorized the answers:

kache@yacineMTB

>it can draw a unicron the unicorn:

Sully @SullyOmarr

Gpt2 drawing unicorns vs Claude opus Whatever this model is, its really good.

7:10 PM · Apr 29, 2024 · 234K Views

19 Replies · 56 Reposts · 2.08K Likes

Theories abounded, was it reasoning and planning agent bolted onto the original, now open-sourced GPT2?

albs—@albfresco

my guess is this mysterious 'gpt2-chatbot' is literally OpenAI's gpt-2 from 2019 finetuned with modern assistant datasets. in which case that means their original pre-training is still amazing and better than everyone else's 4 years later

3:15 PM · Apr 29, 2024 · 621K Views

73 Replies · 79 Reposts · 1.03K Likes

gfodor.id@gfodor

Ok now I’m wondering Maybe they bolted a new, non-LLM reasoning model onto a GPT-2 trained entirely for the purpose of domain knowledge compression Would explain the name, the domain depth (the main consistent observation) and the overall quality

gfodor.id @gfodor

Imo the odds of this are near zero If not then we are in big trouble

1:41 PM · Apr 30, 2024 · 7.64K Views

10 Replies · 1 Repost · 40 Likes

The pinnacle of technical discussion that is 4chan weighed in

sandrone@kosenjuu

The chan on “gpt2-chatbot”

2:28 PM · Apr 29, 2024 · 1.24M Views

95 Replies · 196 Reposts · 4.17K Likes

Sam Altman played into the whole furor:

Even his edits were scrutinized, gpt2 or gpt-2?

The prompt was dug up

andrew gao@itsandrewgao

gpt2-chatbot's system prompt, leaked via prompt injection by @BahouPrompts this allegedly us tells that it is a variant of GPT-4 BUT it could be a lie intentionally added by the developers to fool us OR gpt2-chatbot could have hallucinated that so it's not conclusive.

9:53 PM · Apr 29, 2024 · 27.3K Views

10 Replies · 9 Reposts · 72 Likes

Getting chased down, lmsys clarified that a) it was a new model and b) it was secretly introduced for testing in partnership with the developer. Is lmsys getting paid for it? Unclear.

Attention intensified

dic@dicnunz

Nice knowing ya "gpt2-chatbot." We'll meet again in another iteration.

6:07 PM · Apr 30, 2024 · 6.41K Views

2 Replies · 3 Reposts · 18 Likes

And soon, the fix was in.

andrew gao@itsandrewgao

gpt2-chatbot was just turned OFFLINE I was just using it half an hour ago! @shaunralston for the find #gpt2 @OpenAI

6:20 PM · Apr 30, 2024 · 488K Views

21 Replies · 12 Reposts · 210 Likes

Bye-bye gpt2-chatbot, we hardly knew ye. lmsys updated its policies to disclose:

What have we learned from this?

There are monsters out there—undisclosed groups working on projects with high capability
Capability increases are easier than we thought - this was likely a small organization, given that large providers have ethics reviews prior to release, and one of the core underpinnings of AI safety is that humans have a right to know about the AI they are interacting with
Benchmarking organizations deserve greater scrutiny

Axios later (and uselessly) reported: “Speaking on Wednesday at Harvard University, Altman told an audience that the mystery bot is not GPT-4.5, what many see as the likely next major update to GPT-4.“

Ah, the classic confirming non-conformation.

Then, on Sunday,