Meta Releases Llama 3, Two Models with 8 Billion and 70 Billion Parameters

IBL News | New York

Meta released this week Llama 3, with two models: Llama 3 8B, which contains 8 billion parameters, and Llama 3 70B, with 70 billion parameters. (The higher-parameter-count models are more capable than lower-parameter-count models.)

Llama 3 models are now available for download and experience at They will soon be hosted in managed form across a wide range of cloud platforms, including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM’s WatsonX, Microsoft Azure, Nvidia’s NIM, and Snowflake. In the future, versions of the models optimized for hardware from AMD, AWS, Dell, Intel, Nvidia, and Qualcomm will also be made available.

Llama 3 models power Meta’s Meta AI assistant on Facebook, Instagram, WhatsApp, Messenger, and the web.

“Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across core [large language model] capabilities such as reasoning and coding,” Meta wrote in a blog post.

Meta AI icon animation

The company said that these two 8B and 70B models, trained on two custom-built 24,000 GPU clusters, are among the best-performing generative AI models available today. To support this claim, Meta pointed to the scores on popular AI benchmarks like MMLU (which attempts to measure knowledge), ARC (which attempts to measure skill acquisition), and DROP (which tests a model’s reasoning over chunks of text).

Llama 3 8B bests other open models such as Mistral’s Mistral 7B and Google’s Gemma 7B, both of which contain 7 billion parameters, on at least nine benchmarks: MMLU, ARC, DROP, GPQA (a set of biology-, physics- and chemistry-related questions), HumanEval (a code generation test), GSM-8K (math word problems), MATH (another mathematics benchmark), AGIEval (a problem-solving test set) and BIG-Bench Hard (a commonsense reasoning evaluation).

Meta Llama 3

Llama 3 70B beats Gemini 1.5 Pro on MMLU, HumanEval, and GSM-8K, and — while it doesn’t rival Anthropic’s most performant model, Claude 3 Opus — Llama 3 70B scores better than the second-weakest model in the Claude 3 series, Claude 3 Sonnet, on five benchmarks (MMLU, GPQA, HumanEval, GSM-8K and MATH).

Meta Llama 3

Meta also developed its own test set covering use cases ranging from coding and creative writing to reasoning to summarization. Llama 3 70B came out on top against Mistral’s Mistral Medium model, OpenAI’s GPT-3.5, and Claude Sonnet.

Meta Llama 3

Meta Llama 3