Databricks Will Acquire AI Open-Source Startup MosaicML for $1.3 Billion

IBL News | New York

San Francisco – based data storage and management startup Databricks, this week, announced that it will pay $1.3 billion to acquire MosaicML, an open-source startup that enables businesses to build low-cost LLMs (large language models) with proprietary data.

Its two models, MPT-7B and the recent release of MPT-30B, had 3.3 million downloads.

The deal is expected to close during Databricks’ second quarter ending July 31.

“Every organization should be able to benefit from the AI revolution with more control over how their data is used. Databricks and MosaicML have an incredible opportunity to democratize AI and make the Lakehouse the best place to build generative AI and LLMs,” said Ali Ghodsi, Co-Founder and CEO of Databricks.

Databricks intends to combine its Lakehouse Platform with MosaicML’s technology to offer customers a way to train and use LLMs with more control and ownership over how their data is used.

According to MosaicML, “combined with near linear scaling of resources, multi-billion-parameter models can be trained in hours, not days, and it will cost thousands of dollars, not millions.”

Launched in 2021 and with a workforce of 62 employees today, MosaicML had raised $64 million from investors that included DCVC, AME Cloud Ventures, Lux, Frontline, Atlas, Playground Global, and Samsung Next.

Companies like Anthropic and OpenAI license ready-made language models to businesses, which then build generative AI apps on top of them. MosaicML says they can offer similar AI models but at a lower cost and customize with a company’s data. The current cost of training a model on specialized data is estimated at $1 million to $2 million, according to experts.Those kinds of domain-specific models can be more useful for companies than building on top of the entire corpus of data that OpenAI.Large language models are becoming fine-tuned for very specific applications, and at that point, it is so small that they could be embedded into any cellphone.

Some of those models using smaller, pre-trained models are already available in open-source libraries like those offered by machine-learning startup Hugging Face.