IBL News | New York
Silicon Valley–based maker of a dedicated AI computer and the world’s largest computer chip, Cerebras Systems released a series of seven GPT large language models (LLMs), methodology, training weights, and a recipe for open use via the permissive industry-standard Apache 2.0 license. This solution, called Cerebras-GPT, means that these models can be used for research or commercial ventures without royalties.
The company used non-Nvidia GPU-based systems to train LLMs up to 13 billion parameters. All seven models were trained on the sixteen CS-2 systems in the Cerebras Andromeda AI supercomputer using the Chinchilla formula.
“These are the highest accuracy models for a computing budget and are available today open-source,” said the company.
In a first among AI hardware companies, Cerebras researchers trained a series of seven GPT models with 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B parameters.
“Typically a multi-month undertaking, this work was completed in a few weeks thanks to the incredible speed of the Cerebras CS-2 systems that make up Andromeda, and the ability of Cerebras’ weight streaming architecture to eliminate the pain of distributed computing. These results demonstrate that Cerebras’ systems can train the largest and most complex AI workloads today.”
- “The training weights provide a highly accurate pre-trained model for fine-tuning. By applying a modest amount of custom data, anyone can create powerful, industry-specific applications with minimal work.”
- “The models’ various sizes and their accompanying checkpoints allow AI researchers to create and test new optimizations and workflows that broadly benefit the community.”
Traditional LLM training on GPUs requires a complex amalgam of pipeline, model, and data parallelism techniques. Cerebras’ weight streaming architecture is a data-parallel-only model that requires no code or model modification to scale to arbitrarily large models.
“We’ve worked to make this task easier with releases such as the Pile and the Eval Harness, and we are very excited to see Cerebras build on our work to produce a family of open models that will be useful to researchers around the world,” said Stella Biderman, Executive Director at EleutherAI.
Cerebras published a technical blog post with the details of the seven models and the scaling laws that they produce. A research paper will be released shortly.
The company posted not just the programs’ source, in Python and TensorFlow format, but also the details of the training regimen by which the programs were brought to a developed state of functionality.
Currently, a handful of companies hold the keys to LLMs. OpenAI is closed, with GTP-4 operating as a black box for the public. Meta’s LLAMA is closed to for-profit organizations, and Google is closed to a varying degree.
Cerebras, echoing the researchers’ community, says that AI needs to be open and reproducible for it to broadly benefit humanity.
🎉 Exciting news! Today we are releasing Cerebras-GPT, a family of 7 GPT models from 111M to 13B parameters trained using the Chinchilla formula. These are the highest accuracy models for a compute budget and are available today open-source! (1/5)
— Cerebras (@CerebrasSystems) March 28, 2023
You can now clone ChatGPT!
OpenAI didn’t open-source its models, so we don’t know much behind the scenes.
But the first complete end-to-end model pipeline was just released, and it’s the most practical open-source project resembling ChatGPT.
Here are the details:
— Santiago (@svpino) March 29, 2023