GPT-4 Beater
The team at Alibaba releases the first model to surpass GPT-4 on some benchmarks yesterday.
The team at Alibaba releases the first model to surpass GPT-4 on some benchmarks yesterday. Qwen-VL-Max matches or surpasses GPT-4 and Gemini Ultra on some metrics… and is open source.. so you can try it for yourself at huggingface (overloaded since launch.. so maybe in a few days!)
Notably:
It’s primarily an image comprehension model - which makes sense as the Chinese language is far more vision-centric with much more complex UI design than
It can annotate and respond with images. In response to an input image and the prompt “Locate the red car”

It can understand the significance of parts of an image. One can only imagine an optimized model deployed locally, for autonomous driving, like the example below:
It understands flowcharts, diagrams, charts, and graphs. And can reason about them. It can solve grade school math problems (who needs AGI after this)

Understands and can explain flowcharts
It can understand and parse and transform chart data. The below would replace the McKinsey analyst class of 2025.
It can reason from diagrams. The below is analogous to Raven’s Standard Progressive Matrices, a widely used intelligence test.. so we are not far from one of these models coolly beating humans at that test
It is very good at extracting structured data from images.. the below is a product many startups worked on in last decade
Qwen has also been trained on dense text information retrieval on unusual aspect ratios, like the below:
All in all, Qwen is going to be a very useful model for a lot of enterprise tasks and seems to have beaten GPT-4V on some of these. The language capability is not yet GPT-4 standard yet… but the intelligence is.. so maybe it’s not that far behind.
This is definitely acceleration, and we are happy China is open-sourcing it.










