Google Issues "Gemini Omni:", a New Model that Can Create Anything from any Input

IBL News | New York

Google’s new Gemini Omni model was officially released yesterday at the company’s annual I/O developer conference in Mountain View, California, a single foundation model with a single editing surface, going far beyond the multimodal generative stack — text-to-image, image-to-video, video-to-video, audio generation, as well as the image-generation and editing model Nano Banana.

Google presented it as “the first truly native model that can create anything from any input — starting with video.”

This model is available only to individual users through Google’s AI subscription plans, starting with the $ 20-per-user-per-month “AI Plus” plan. It can currently be accessed on the Gemini website and mobile apps, Google’s web-based Flow AI image and video editing suite, and YouTube Shorts.

The API — which many enterprises rely on for their AI needs — is not ready yet.

The first model in the family, Gemini Omni Flash, accepts any combination of text, images, audio, and video as input and produces high-quality output across the same modalities.

Google says the model is “natively multimodal from the ground up.”

For “marketing video,” Omni’s value proposition for enterprises is programmable video and media engine rather than a creative app:

Sales and marketing: rapid generation of variant ads, localized creative, and product demos without per-asset agency cycles.
Internal communications, learning and development (L&D): explainer videos, onboarding modules, and policy walkthroughs produced by non-specialists.
Customer support and documentation: dynamic, query-conditioned visual explainers attached to help articles.
Product and engineering: visualization of simulations, UI walkthroughs, and concept videos for spec reviews.
Field operations: short, situation-specific instructional clips generated on demand.

The competitive landscape is crowded with Synthesia, TikTok’s Seedance model, Kuaishou Technology’s Kling AI models, and the fast-improving open-source field

We’re dropping Gemini Omni: our first step towards a model that can create anything from anything – starting with video. It combines Gemini’s intelligence with our generative media systems – representing a leap forward in world understanding, multimodality, and editing

We’re dropping Gemini Omni: our first step towards a model that can create anything from anything – starting with video.

It combines Gemini’s intelligence with our generative media systems – representing a leap forward in world understanding, multimodality, and editing 🧵 pic.twitter.com/GAtqzr0VIV

— Google DeepMind (@GoogleDeepMind) May 19, 2026