The race to develop artificial general intelligence (AGI) still has a long way to run, according to Apple researchers who found that leading AI models still have trouble reasoning.
Recent updates to leading AI large language models (LLMs) such as OpenAI’s ChatGPT andAnthropic’s Claudehave included large reasoning models (LRMs), but their fundamental capabilities, scaling properties, and limitations “remain insufficiently understood,” said the Apple researchers in a Junepapercalled “The Illusion of Thinking.”
They noted that current evaluations primarily focus on established mathematical and coding benchmarks, “emphasizing final answer accuracy.”
However, this evaluation does not provide insights into the reasoning capabilities of the AI models, they said.
The research contrasts with anexpectationthat artificial general intelligence is just a few years away.
Apple researchers test “thinking” AI models
The researchers devised different puzzle games to test “thinking” and “non-thinking” variants of Claude Sonnet, OpenAI’s o3-mini and o1, and DeepSeek-R1 and V3 chatbots beyond the standard mathematical benchmarks.
They discovered that “frontier LRMs face a complete accuracy collapse beyond certain complexities,” don’t generalize reasoning effectively, and their edge disappears with rising complexity, contrary to expectations for AGI capabilities.
AI chatbots are overthinking, say researchers
They found inconsistent and shallow reasoning with the models and also observed overthinking, with AI chatbots generating correct answers early and then wandering into incorrect reasoning.
Related:AI solidifying role in Web3, challenging DeFi and gaming: DappRadar
The researchers concluded that LRMs mimic reasoning patterns without truly internalizing or generalizing them, which falls short of AGI-level reasoning.
The race to develop AGI
AGI is the holy grail ofAI development, a state where the machine can think and reason like a human and is on a par with human intelligence.
In January, OpenAI CEO Sam Altmansaidthe firm was closer to building AGI than ever before. “We are now confident we know how to build AGI as we have traditionally understood it,” he said at the time.
In November, Anthropic CEO Dario Amodeisaidthat AGI would exceed human capabilities in the next year or two. “If you just eyeball the rate at which these capabilities are increasing, it does make you think that we’ll get there by 2026 or 2027,” he said.
Magazine:Ignore the AI jobs doomers, AI is good for employment says PWC: AI Eye
Explore more articles like this
Subscribe to our Crypto Biz newsletter
Weekly snapshot of key business trends in blockchain and crypto, from startup buzz to regulatory shifts. Gain valuable insights to navigate the market and spot financial opportunities. Delivered every Thursday
By subscribing, you agree to ourTerms of Services and Privacy Policy