Apple Unveils New Methods for AI Multimodal Models

IBL News | New York

Apple researchers presented in a paper new methods for training multimodal LLM on both text and images in what seems to be a significant advance for Generative AI and future Apple products.

The research, posted this month, was described in a paper titled “MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training”.

The largest 30B parameter model showed a strong ability to learn from only a handful of examples and reason over multiple images.

This multimodal model benchmarks compete with GPT-4V and Gemini Pro.

The model was trained on a carefully curated mix of image captions, image-text data, and text-only data.

Experts say that Apple’s level of detail is a big departure for the brand, and represents a massive win for the open-source community.