Technology

Even some of the best AI can’t beat this new benchmark | TechCrunch

Even some of the best AI can’t beat this new benchmark | TechCrunch


The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems.

The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make the evaluation tougher, the questions are in multiple formats, including formats that incorporate diagrams and images.

In a preliminary study, not a single publicly available flagship AI system managed to score better than 10% on Humanity’s Last Exam.

CAIS and Scale AI say they plan open up the benchmark to the research community so that researchers can “dig deeper into the variations” and evaluate new AI models.



Source link

Related posts

Backed by a16z and QED, Brazilian startup Carecode puts AI agents to work on healthcare | TechCrunch

admin

Why export restrictions aren’t the only thing to pay attention to in Nvidia’s earnings | TechCrunch

admin

Donald Trump To Take Oath As US President Tonight: Here’s When, How You Can Watch It Online

admin

Leave a Comment