Scale AI Evaluated the Performance of LLMs
June 26, 2024

IBL News | New York
Training data provider Scale AI Inc., which serves OpenAI and Nvidia, published its LLM leaderboard ranking AI models’ performance, rating their capabilities in common use cases, such as generative AI coding, instruction following, math, and multilingualism.
The AI training company didn’t divulge the nature of the prompts it used to evaluate LLMs.
OpenAI’s GPT models rank first in three of the four initial domains, with Anthropic’s Claude 3 Opus grabbing first place in the fourth category. Google LLC’s Gemini models rank joint-first with the GPT models in a few domains.




Many high-profile LLMs have been left out of the evaluations. For instance, AI21 Labs Inc.’s Jurassic and Jamba, Cohere Inc.’s Aya and Command LLMs, and xAI’s Grok are notably absent from these assessments.
Discover more
IBL News is funded by the New York-based, family-owned company ibl.ai. Our stories adhere to the highest ethical standards in journalism and are available to news syndication agencies.








