A recent paper from Hugging Face introduces a 30B-A3B reasoning model that demonstrates gold-medal-level performance in both mathematical and physics competitions, including the International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO). The model, named SU-01, employs a systematic approach that includes a reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling to enhance its proof-search capabilities. Trained on 340,000 sub-8K-token trajectories and refined through 200 reinforcement learning steps, SU-01 can tackle complex problems with trajectories exceeding 100,000 tokens. The authors emphasize the model's strong generalization abilities beyond traditional mathematics and physics domains, and they have open-sourced the code and model for public access.
New 30B-A3B Reasoning Model Achieves Gold-Medal Performance in Olympiads
More Articles From This Day
OpenClaw Creator Invests $1.3M in OpenAI Tokens Within a Month
The creator of OpenClaw has reportedly spent $1.3 million on OpenAI tokens over the course of 30 days, highlighting significant financial backing for the project. This substantial investment reflects the growing interest in generative AI technologies and their applications. The community has responded with 122 comments on the announcement, indicating a lively discussion around the implications of such investments in AI.
