Recent evaluations show that the latest AI models, including GPT-5.5 and Opus 4.7, have scored below 1% on the ARC-AGI-3 benchmark, with GPT-5.5 achieving 0.43% and Opus 4.7 at 0.18%. The analysis identifies three main failure modes: a true local effect with a false world model, an incorrect level of abstraction from the training data, and a failure to reinforce the reward despite solving the level. Further insights can be found in the full analysis linked in the discussion.
Current AI Models Achieve Below 1% on ARC-AGI-3 Benchmark
More Articles From This Day
Elon Musk Claims Deception in OpenAI Trial, Warns of AI Threats
In the ongoing trial between Elon Musk and OpenAI, Musk accused CEO Sam Altman and President Greg Brockman of misleading him into funding the company, claiming he provided $38 million to support a nonprofit aimed at benefiting humanity. He expressed concerns that AI could pose existential threats, referencing his own AI company, xAI, which utilizes OpenAI's models. Musk is seeking to oust Altman and Brockman and revert OpenAI to its original nonprofit status. The trial's outcome could significantly impact OpenAI's anticipated IPO, while xAI is projected to go public as part of Musk's SpaceX with a target valuation of $1.75 trillion. Musk's testimony emphasized his commitment to AI safety, countered by claims from OpenAI's legal team suggesting his motives were competitive rather than altruistic.
