OpenAI o3 Reportedly Has an IQ of 157, on Par with Einstein, but Can’t Prove to Be Smarter than Humans.

OpenAI O3 has been reported to have an IQ as high as 157, on a par with Einstein. However, measuring AI with human IQ standards has limitations. Traditional IQ tests are an evaluation system for human cognitive abilities, and using them to judge AI has methodological biases. The human brain and AI are significantly different. In essence, AI is a probability machine based on specific algorithms. Currently, AI imitates human cognitive functions in some aspects but can make basic mistakes, such as being unable to distinguish the size between 9.8 and 9.11. Humans have been looking for a suitable evaluation system to quantify the intelligence level of AI. Both the Turing test and the Mensa test have problems. To show the progress of AI, we can shift the evaluation focus to the ability to solve practical problems. Although there are many benchmark tests, they also have problems. For example, models may “preview” questions in advance, making test results lose their reference value, and test results will tend to “saturate.” For instance, the test results of GPT-3.5, GPT-4, and OpenAI O1 on MMLU seem to show a slowdown in progress. In fact, the test has been conquered. Currently, there are two approaches: user blind testing and introducing new benchmark tests. The ARC-AGI test introduced by OpenAI is widely regarded as an important standard for measuring AGI capabilities. However, even if OpenAI O3 scores well in this test, it does not mean that AGI has been achieved. In conclusion, we should think about making AI better serve the actual needs of human society. This is the most meaningful dimension for evaluating the progress of AI.