2024-03-29: Mix and Match: Fish Edition
merger of the unequal
🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Here’s today at a glance:
Mix & Match
Sakana AI, the Tokyo-based AI startup run by Google and Stability David Ha (@hardmaru) who was the mod for r/MachineLearning for many years, and former Google Llion Jones, finally had a grand unveil of their project:
Model merging
In essence, they took:
A language model that could do math; and
A language model that could output Japanese
Merged them either by summing their weights (ie Parameter Space) or by changing the inference path the tokens take through the layers of the model (ie Data Flow Space) or both
Evolving the merge process toward a better outcome
Until they got to a single language model that could do Japanese math! Something it had not specifically been trained for.
The team innovated on:
Figured out a number of hacks to reduce the search space to something small enough to be feasible
Translated one of the widely used math datasets (GSM8k) to Japanese
Created a new benchmark dataset for Japanese visual language models
And they ended up with a 7 billion parameter model that is state of the art and better than 70 billion parameter Japanese language models.
This is pretty exciting, as there are 500,000 models of various kinds sitting on huggingface, a large number of which barely get any use in a typical Pareto structure. It is quite possible now that besides these models themselves, Sakana has added a toolkit to patch various model capabilities together to make something new.
Notably, the lead researcher on the project was Takuya Akiba, who worked at Stability with David Ha prior. It strikes me that David was able to hire local talent in Tokyo (and though Akiba-san has 3 degrees from Todai, he was working for a Japanese firm prior) at Stability, and saw the writing on the wall 8 months ago and struck out on his own. It does make one wonder how much other trapped talent there is in the world, that we just don’t really know how to hire or address.
Sakana also built a culturally aware Japanese image generation model in the same manner.
🗞️ Things Happen
Google Deepmind founder Demis Hassabis gets knighted.
TSMC comes out to hint that Nvidia’s targeted 1 million fold performance increase might be possible.









