Benchmarks
BabyVision
🏆 Gemini3-Pro: 49.7% · Visual Reasoning
Performance Timeline
Open
Proprietary
Human (94.1%)
Performance (%)
100
80
60
40
20
0
Apr 2025
Jun 2025
Aug 2025
Oct 2025
Dec 2025
Release Date
Human 94.1%
Kimi-VL
KimiVL-A3B
Moonshot AI · Apr 10, 2025
12.4%
MiMo-VL
MimoVL-7B-RL
Xiaomi · Apr 23, 2025
15.1%
Grok-4
Grok-4
xAI · Jul 9, 2025
16.2%
Step3
Step3
StepFun · Jul 31, 2025
14.7%
InternVL3.5
InternVL3.5-241B
OpenGVLab · Aug 26, 2025
19.2%
Qwen3-VL
Qwen3VL-235B-Thinking
Alibaba · Sep 23, 2025
22.2%
Qwen3-VL-Plus
Qwen3-VL-Plus
Alibaba · Sep 23, 2025
19.2%
Gemini 3 Pro
Gemini3-Pro-Preview
Google · Nov 18, 2025
49.7%
Claude 4.5
Claude-4.5-Opus
Anthropic · Nov 24, 2025
14.2%
GLM-4.6V
GLM4.6V
Zhipu AI · Dec 8, 2025
17.6%
GPT-5.2
GPT-5.2
OpenAI · Dec 11, 2025
34.4%
Doubao-1.8
Doubao-1.8
ByteDance · Dec 18, 2025
30.2%
Hover over data points for details. Blue = Open, Orange = Proprietary. Dashed line = Human baseline (94.1%).
Complete Benchmarks
| Rank | Model | Type | Company | Release Date | Score | Progress |
|---|---|---|---|---|---|---|
| - | HumanBASELINE | Human | - | - | 94.1% | |
| 1 | Gemini3-Pro-Preview | Proprietary | Nov 18, 2025 | 49.7% | ||
| 2 | GPT-5.2 | Proprietary | OpenAI | Dec 11, 2025 | 34.4% | |
| 3 | Doubao-1.8 | Proprietary | ByteDance | Dec 18, 2025 | 30.2% | |
| 4 | Qwen3VL-235B-Thinking | Open | Alibaba | Sep 23, 2025 | 22.2% | |
| 5 | InternVL3.5-241B | Open | OpenGVLab | Aug 26, 2025 | 19.2% | |
| 5 | Qwen3-VL-Plus | Proprietary | Alibaba | Sep 23, 2025 | 19.2% | |
| 7 | GLM4.6V | Open | Zhipu AI | Dec 8, 2025 | 17.6% | |
| 8 | Grok-4 | Proprietary | xAI | Jul 9, 2025 | 16.2% | |
| 9 | MimoVL-7B-RL | Open | Xiaomi | Apr 23, 2025 | 15.1% | |
| 10 | Step3 | Open | StepFun | Jul 31, 2025 | 14.7% | |
| 11 | Claude-4.5-Opus | Proprietary | Anthropic | Nov 24, 2025 | 14.2% | |
| 12 | KimiVL-A3B | Open | Moonshot AI | Apr 10, 2025 | 12.4% |
Citation
For details of BabyVision, please read our paper. If you find it useful in your research, please kindly cite:
@article{babyvision2026,
title={BabyVision: Visual Reasoning Beyond Language},
year={2026}
}