IBL News | New York
Microsoft researchers presented VASA-1, a framework for generating hyper-realistic talking video with facial behavior, precise lip-audio sync, and naturalistic head motion produced in real time. It all contributes to the perception of authenticity and liveliness.
This AI model takes a single portrait static photo and speech audio clip and produces videos of virtual characters with appealing visual affective skills (VAS) of 512×512 resolution at up to 40 FPS.
“Our method significantly outperforms previous methods and it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors,” said Microsoft.
The company made clear that VASA-1 was only a research demonstration without a product or API release plan.
.
The First AI-Generated Video That Looks Super Real
Microsoft Research announced VASA-1.
It takes a single portrait photo and speech audio and produces a hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements… pic.twitter.com/6bxd4mEgFR
— Bindu Reddy (@bindureddy) April 17, 2024