2024-03-29: Mix and Match: Fish Edition

merger of the unequal

Prakash

Mar 28, 2024

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

Mix & Match
Things happen
AI artwork of the day

Mix & Match

Sakana AI, the Tokyo-based AI startup run by Google and Stability David Ha (@hardmaru) who was the mod for r/MachineLearning for many years, and former Google Llion Jones, finally had a grand unveil of their project:

Model merging

Omar Sanseviero@osanseviero

Model merging explained in one image

8:03 PM · Jan 11, 2024 · 20.7K Views

7 Replies · 32 Reposts · 248 Likes

In essence, they took:

A language model that could do math; and
A language model that could output Japanese
Merged them either by summing their weights (ie Parameter Space) or by changing the inference path the tokens take through the layers of the model (ie Data Flow Space) or both
Evolving the merge process toward a better outcome
Until they got to a single language model that could do Japanese math! Something it had not specifically been trained for.

The team innovated on:

Figured out a number of hacks to reduce the search space to something small enough to be feasible
Translated one of the widely used math datasets (GSM8k) to Japanese
Created a new benchmark dataset for Japanese visual language models

And they ended up with a 7 billion parameter model that is state of the art and better than 70 billion parameter Japanese language models.

This is pretty exciting, as there are 500,000 models of various kinds sitting on huggingface, a large number of which barely get any use in a typical Pareto structure. It is quite possible now that besides these models themselves, Sakana has added a toolkit to patch various model capabilities together to make something new.

Notably, the lead researcher on the project was Takuya Akiba, who worked at Stability with David Ha prior. It strikes me that David was able to hire local talent in Tokyo (and though Akiba-san has 3 degrees from Todai, he was working for a Japanese firm prior) at Stability, and saw the writing on the wall 8 months ago and struck out on his own. It does make one wonder how much other trapped talent there is in the world, that we just don’t really know how to hire or address.

Sakana also built a culturally aware Japanese image generation model in the same manner.