Benchmarks

BabyVision

🏆 Gemini3-Pro: 49.7% · Visual Reasoning

Performance Timeline

Open

Proprietary

Human (94.1%)

Performance (%)

100

Apr 2025

Jun 2025

Aug 2025

Oct 2025

Dec 2025

Release Date

Human 94.1%

Kimi-VL

MiMo-VL

Grok-4

Step3

InternVL3.5

Qwen3-VL

Qwen3-VL-Plus

Gemini 3 Pro

Claude 4.5

GLM-4.6V

GPT-5.2

Doubao-1.8

Hover over data points for details. Blue = Open, Orange = Proprietary. Dashed line = Human baseline (94.1%).

Can MLLMs See Like a 3-Year-Old?

Rank	Model	Type	Company	Release Date	Score
-	HumanBASELINE	Human	-	-	94.1%
1	Gemini3-Pro-Preview	Proprietary	Google	Nov 18, 2025	49.7%
2	GPT-5.2	Proprietary	OpenAI	Dec 11, 2025	34.4%
3	Doubao-1.8	Proprietary	ByteDance	Dec 18, 2025	30.2%
4	Qwen3VL-235B-Thinking	Open	Alibaba	Sep 23, 2025	22.2%
5	InternVL3.5-241B	Open	OpenGVLab	Aug 26, 2025	19.2%
5	Qwen3-VL-Plus	Proprietary	Alibaba	Sep 23, 2025	19.2%
7	GLM4.6V	Open	Zhipu AI	Dec 8, 2025	17.6%
8	Grok-4	Proprietary	xAI	Jul 9, 2025	16.2%
9	MimoVL-7B-RL	Open	Xiaomi	Apr 23, 2025	15.1%
10	Step3	Open	StepFun	Jul 31, 2025	14.7%
11	Claude-4.5-Opus	Proprietary	Anthropic	Nov 24, 2025	14.2%
12	KimiVL-A3B	Open	Moonshot AI	Apr 10, 2025	12.4%

For details of BabyVision, please read our paper. If you find it useful in your research, please kindly cite:

@article{babyvision2026,
  title={BabyVision: Visual Reasoning Beyond Language},
  year={2026}
}