Apple Released ‘MGIE’, an Open Source AI Multimodal Model for Image Editing

IBL News | New York

Apple released last week MGIE (MLLM-Guided Image Editing), a new open-source AI model that edits images based on natural language instructions. It leverages multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations.

Experts agreed that MGIE represents a major breakthrough, highlighting that the pace of progress in multimodal AI systems is accelerating quickly.

The model can handle a wide range of editing scenarios, such as simple color and brightness adjustments, photo optimization, object manipulations, and Photoshop-style modification, such as cropping, resizing, rotating, flipping, and adding filters.

For example, an instruction can make the sky more blue, and MGIE produces the instruction to increase the saturation of the sky region by 20%.

MGIE — which was presented in a paper accepted at the International Conference on Learning Representations (ICLR) 2024 — is the result of a collaboration between Apple and researchers from the University of California, Santa Barbara.

MGIE is available as an open-source project on GitHub. The project also provides a demo notebook that shows how to use MGIE for various editing tasks. Users can also try out MGIE online through a web demo hosted on Hugging Face Spaces.
.