Microsoft Research Presents VASA-1, an AI Framework for Generating Hyper-Realistic Talking Videos

IBL News | New York

Microsoft researchers presented VASA-1, a framework for generating hyper-realistic talking video with facial behavior, precise lip-audio sync, and naturalistic head motion produced in real time. It all contributes to the perception of authenticity and liveliness.

This AI model takes a single portrait static photo and speech audio clip and produces videos of virtual characters with appealing visual affective skills (VAS) of 512×512 resolution at up to 40 FPS.

“Our method significantly outperforms previous methods and it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors,” said Microsoft.

The company made clear that VASA-1 was only a research demonstration without a product or API release plan.

Examples of videos