Technology

Even some of the best AI can’t beat this new benchmark | TechCrunch

Even some of the best AI can’t beat this new benchmark | TechCrunch


The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems.

The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make the evaluation tougher, the questions are in multiple formats, including formats that incorporate diagrams and images.

In a preliminary study, not a single publicly available flagship AI system managed to score better than 10% on Humanity’s Last Exam.

CAIS and Scale AI say they plan open up the benchmark to the research community so that researchers can “dig deeper into the variations” and evaluate new AI models.



Source link

Related posts

Samsung chief Jay Y. Lee cleared of all charges in 2015 merger case | TechCrunch

admin

Elon Musk’s The Boring Company might be in line for an Amtrak contract | TechCrunch

admin

Get startup insights from Chef Robotics, NEA, and ICONIQ at Disrupt 2025 | TechCrunch

admin

Leave a Comment