
Here’s the rough visual comparison you asked for:
-
Humans cluster around 89–100 (the world’s actual population average).
-
ChatGPT spans widely from ≈ 82 to 150, depending on the test and domain.
-
Grok appears in the narrower range ≈ 110–130.
It gives a simple sense that current large language models can outperform average humans on certain reasoning tasks, yet remain inconsistent or unaligned with human-style general intelligence.
--