Hugging Face Launches APEX-Agents Leaderboard to Evaluate Open-Source AI Models

Hugging Face· Hugging Face· Friday, May 1, 2026

Hugging Face has introduced a leaderboard for APEX-Agents, a benchmark designed to assess the capabilities of open-source AI models in performing professional tasks typically handled by consultants, lawyers, and bankers. This benchmark evaluates AI agents on their ability to navigate realistic work environments, including the use of documents, spreadsheets, and other tools required for completing complex, long-horizon tasks. APEX-Agents utilizes rubric-based grading and features live updates for new models. The service is available open-source on GitHub, and the content is intended for research and educational purposes only, with disclaimers about the hypothetical nature of the scenarios used in evaluations.

Read Full Article

View All For This Day

More Articles From This Day

RegulationAI+2

RegulationAIChild SafetyOpenAI

Senate Panel Supports AI Child Safety Bill Targeting OpenAI and Meta

A Senate panel has backed a bill aimed at enhancing child safety in artificial intelligence, specifically targeting companies like OpenAI and Meta. This legislative move seeks to address the growing concerns over the potential risks that AI technologies pose to children. The bill reflects increasing regulatory scrutiny of the AI sector as lawmakers aim to implement protective measures for vulnerable populations. The discussions highlight the urgent need for guidelines and standards in the rapidly evolving landscape of AI applications.

Hugging Face Launches APEX-Agents Leaderboard to Evaluate Open-Source AI Models

More Articles From This Day

Senate Panel Supports AI Child Safety Bill Targeting OpenAI and Meta

Anthropic Launches Claude Security Tool for Enhanced Cybersecurity Vulnerability Scanning

Google Expresses Pride in Pentagon AI Contract Following Internal Backlash

Hugging Face Launches GPT-5.5 in ml-intern App as a Research Partner

Hugging Face Launches AgentTrove, a New Dataset Featuring 1.7 Million Samples

Introducing HalluCiteChecker: A Toolkit for Detecting Hallucinated Citations in Scientific Papers