IBL News | New York
Apple researchers presented in a paper new methods for training multimodal LLM on both text and images in what seems to be a significant advance for Generative AI and future Apple products.
The research, posted this month, was described in a paper titled “MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training”.
The largest 30B parameter model showed a strong ability to learn from only a handful of examples and reason over multiple images.
This multimodal model benchmarks compete with GPT-4V and Gemini Pro.
The model was trained on a carefully curated mix of image captions, image-text data, and text-only data.
Experts say that Apple’s level of detail is a big departure for the brand, and represents a massive win for the open-source community.