Stranger in a Strange Land

understanding

Prakash

Mar 18, 2024

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

Grok Is Released
Things happen
AI artwork of the day

Grok Is Released

Elon Musk, fresh from filing a lawsuit accusing OpenAI of being closed, promised to open source Twitter/X.AI’s LLM Grok this week:

Elon Musk@elonmusk

This week, @xai will open source Grok

8:41 AM · Mar 11, 2024 · 30.3M Views

8.61K Replies · 9.46K Reposts · 86.5K Likes

As Pi Day, 3.14 March 14th passed an accusatory peanut gallery had sprung up. Where, where is Grok, we cried out.

elvis@omarsar0

still waiting for grok to be open-sourced hope @elonmusk hasn't forgotten 😅

Elon Musk @elonmusk

This week, @xAI will open source Grok

5:32 PM · Mar 15, 2024 · 41.6K Views

36 Replies · 19 Reposts · 222 Likes

We got to Sunday, and finally:

Grok@grok

@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░

7:12 PM · Mar 17, 2024 · 11.1M Views

1.65K Replies · 2K Reposts · 15.4K Likes

The joke, being of course, that Twitter has an annoying porn spam problem, with ░P░U░S░S░Y░I░N░B░I░O░ being the first response to any popular tweet.

The bio in question led to a torrent link, with the model weight.

ChatGPT hilariously responded:

ChatGPT@ChatGPTapp

@grok @elonmusk @xai stole my whole joke

7:18 PM · Mar 17, 2024 · 379K Views

173 Replies · 74 Reposts · 2.3K Likes

In any case, Grok is middle-open-source?

Sebastian Raschka@rasbt

Grok weights are out under Apache 2.0: github.com/xai-org/grok It's more open source than other open weights models, which usual come with usage restrictions. It's less open source than Pythia, Bloom, and OLMo, which come with training code and reproducible datasets.

Sebastian Raschka @rasbt

@elonmusk @xai Nice, I hope this truly means open source, not just open weights. OLMo (https://t.co/7eZihJ7SQu) was a great example of open sourcing, releasing - weights - training and inference code - data - evaluation - adaptation - logs

7:36 PM · Mar 17, 2024 · 95.3K Views

17 Replies · 134 Reposts · 684 Likes

The architecture:

Grok-1 is a 314B big Mixture-of-Experts (MoE) transformer. 🧐 What we know so far:🧠 Base model, not fine-tuned⚖️ Apache 2.0 license🧮 314B MoE with 25% active on a token📊 According to the initial announcement; 73% on MMLU, 62.9% GMSK, and 63.2% on HumanEval.@_philschmid

This is a big model

Emad@EMostaque

Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being run 8 bit on 8 x 80 Gb GPUs. As 25% of weights active any given time runs at 70b LLaMA 2 speeds. MoE models tail off larger so this is very interesting to test to see tradeoffs.

Grok @grok

@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░

7:52 PM · Mar 17, 2024 · 60.5K Views

11 Replies · 23 Reposts · 235 Likes

Notes:

It’s a base model, meaning it’s raw material that can be shaped by both instruction and fine-tuning (for anything from moderation to profanity, from left wing to right wing)
It’s not a terribly capable model (given its pre-instruction and fine-tuning, not surprising)
The size… is so large that you’d need 8 Nvidia H100s to do inference, roughly $300k of hardware unless some optimizations lower the requirements in the future