On Coding Agents

It's the end of software engineering as we know it

Prakash

Mar 15, 2024

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

Devin-ing the Future
Things happen
AI artwork of the day

🔮 Devin-ing the Future

Cognition Labs, backed by the Founders Fund amongst others, launched Devin, an AI coding agent.

Cognition@cognition

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is

1:50 PM · Mar 12, 2024 · 31.4M Views

4.33K Replies · 9.77K Reposts · 42.9K Likes

There is no open demo, but the pre-recorded demo one shows Devin building websites from single-line prompts including:

investigating which libraries to use
recovering from errors
using print statements for debugging
reading API documentation
keeping track of various steps in the process
providing audit access

Devin was built by a team led by Scott Wu, an International Olympiad for Informatics legend

iulia@_iuliagroza

I hope competitive programming will get even more popular now that competitive programmers get credits for the products they deliver at top tier ML startups. btw, here is Scott Wu, founder of Cognition, securing 1st place at IOI with a perfect score, not even 10 years ago:

7:55 PM · Mar 13, 2024 · 983 Views

7 Likes

Scott was always legendary by the way

Siqi Chen@blader

this is the ceo of cognition 14 years ago the idea that 10x/100x engineers don’t exist is such a cope

12:22 AM · Mar 13, 2024 · 955K Views

158 Replies · 299 Reposts · 3.16K Likes

Devin’s team has 10 IOI gold medals between the 9 co-founders which is absolutely insane if you think about it.

Andrej Karpathy, former Tesla AI head, had this to say:

# automating software engineeringIn my mind, automating software engineering will look similar to automating driving. E.g. in self-driving the progression of increasing autonomy and higher abstraction looks something like:1. first the human performs all driving actions manually2. then the AI helps keep the lane3. then it slows for the car ahead4. then it also does lane changes and takes forks5. then it also stops at signs/lights and takes turns6. eventually you take a feature complete solution and grind on the quality until you achieve full self-driving.There is a progression of the AI doing more and the human doing less, but still providing oversight. In Software engineering, the progression is shaping up similar:1. first the human writes the code manually2. then GitHub Copilot autocompletes a few lines3. then ChatGPT writes chunks of code4. then you move to larger and larger code diffs (e.g. Cursor copilot++ style, nice demo herehttps://youtube.com/watch?v=Smklr44N8QU)5....Devin is an impressive demo of what perhaps follows next: coordinating a number of tools that a developer needs to string together to write code: a Terminal, a Browser, a Code editor, etc., and human oversight that moves to increasingly higher level of abstraction.There is a lot of work not just on the AI part but also the UI/UX part. How does a human provide oversight? What are they looking at? How do they nudge the AI down a different path? How do they debug what went wrong? It is very likely that we will have to change up the code editor, substantially.In any case, software engineering is on track to change substantially. And it will look a lot more like supervising the automation, while pitching in high-level commands, ideas or progression strategies, in English.Good luck to the team!@karpathy

This is not actually a promising quote, as Karpathy seems to imply a 10-year or more ramp to automating code generation fully.

What is surprising for me is that Devin wraps GPT-4, meaning costs right now are unbearably high, somewhere between $120-300 an hour:

brian-machado-high-inference@sincethestudy

sanity check on @cognition_labs Per task, Devin is likely doing 10-20 maxed out GPT4-32k calls per minute, thats ~$2-$5/min, or $120-$300/hr depending on Input/Output token ratio. my fellow waterloo interns will gladly outperform Devin for that hourly cost :)

1:52 PM · Mar 13, 2024 · 97.9K Views

31 Replies · 21 Reposts · 683 Likes

So what’s really going on?

Cognition has a ridiculous team
Founders Fund funds them
Devin is a demo product, and examples of usage online are cherry-picked but not doctored
Scaffolded agentic systems making calls to LLMs are going to be a thing
Devin aims to be the coding agent that does that

My guess? GPT-5 is likely to have much of this capability built in. The unfortunate fact of the matter is that OpenAI owes Microsoft 2 trillion in revenue in order to free AGI. That means that OpenAI will have to continuously tackle larger markets. Coding is definitely on the target list. Every team building connective tissue to overcome GPT-4’s limitations is probably in for a surprise.

I can kind of sense that the OpenAI team is almost apologetic at this point, hoping not to destroy too many friendships in the future.

Noam Brown@polynoamial

2024 is going to be an exciting year for AI

Cognition @cognition

11:43 PM · Mar 12, 2024 · 27.1K Views

6 Replies · 9 Reposts · 191 Likes

🗞️ Things Happen

We finally found out how they named MAMBA:

Timothy B. Lee@binarybits

Mamba is such a great name for an AI model. Its predecessor was called the Structured State Space for Sequence Modeling (S4). Mamba adds selective scanning, making it the Selective Scan Structured State Space for Sequence Modeling (S6). So they named it after a snake: SSSSSS.

3:20 PM · Mar 13, 2024 · 1.79K Views

1 Reply · 1 Repost · 12 Likes

Perplexity integrates with Maps and Yelp. It’s a pleasure to watch them ship, and ship and ship. If they ship fast and hard enough, they may be able to severely wound Google.