Open Source Software Projects Will Dominate LLMs

IBL News | New York

Experts say that, as it happened with Linux, the world-class operating system, open source will dominate the future of LLMs and image models. Even Google acknowledged that they have no moat in this new world of open source AI.

“If you’re building an AI native product, your primary goal is getting off of OpenAI as soon as you possibly can,” wrote Varun Shenoy in the viral article titled “Why Open Source AI Will Win.”

Furthermore, using closed-source model providers such as OpenAI or Anthropic for the long haul exposes an AI-native company to undue risk. Every business needs to own its core product, and its core product is a model trained on proprietary data.

The consensus is that open source models are incredibly good at the most valuable tasks, as they can be fine-tuned to cover likely up to 99% of use cases when a product has collected enough labeled data.

While contexts have scaled up, the hardware requirements to run massive models have also scaled down.

The original Llama has a context length of 2k tokens. Llama 2 has a context length of 4k. However, we still don’t have access to GPT-4 32k. This is the speed of open source.

Users can now run state-of-the-art massive language models from their Macbook thanks to projects like Llama.cpp.

On the image generation side, Stable Diffusion XL (SDXL), the best open source model, is on-par with Midjourney. Hugging Face is the new Red Hat.

  • “Linux succeeded because it was built in the open. Users knew exactly what they were getting and had the opportunity to file bugs or even attempt to fix them on their own with community support. The same is true for open source models.
  • Open source is much harder to use than closed source models. It seems like you need to hire a team of machine learning engineers to build on top of open source as opposed to using the OpenAI API. This is ok and will be true in the short term. This is the cost of control and the rapid pace of innovation. 
  • Closed-source model providers have captured the collective mindshare of this AI hype cycle. People don’t have time to mess around with open source, nor do they have the awareness of what open source is capable of. But they do know about OpenAI, Pinecone, and LangChain.
  • As open source offerings mature and become more user-friendly and customizable, they will emerge as the superior choice for many applications.
  • Rather than getting swept up in the hype, forward-thinking organizations will use this period to deeply understand their needs and lay the groundwork to take full advantage of open source AI. They will build defensible and differentiated AI experiences on open technology. This measured approach enables a sustainable competitive advantage in the long run.