Watch Microsoft’s VASA-1 AI Make The Mona Lisa Sing Like A Rap Star In Wild Demo

hero microsoft vasa 1 mona lisa
Microsoft is showing off its AI chops with a new demo of its VASA-1, making Mona Lisa spit out rhymes like a rap star. The new framework is used for generating lifelike talking faces of virtual characters with visual affective skills (VAS).

The fear of AI being used to make deep fakes of people may have just gotten a bit scarier. Microsoft’s latest announcement of its VASA-1 model is not only capable of making lip movements that are synchronized with audio, but also able to capture a large spectrum of facial nuances and natural head motions that the company says contributes to the perception of authenticity and liveliness. Min Choi shared a video created with VASA-1 on X/Twitter of “Mona Lisa rapping Paparazzi.”
The software giant explained the core innovations of VASA-1 include a holistic facial dynamics and head movement generation model that performs in a face latent space. VASA-1 is said to outperform previous methods along various dimensions comprehensively, and that it delivers high video quality with realistic facial and head dynamics, while supporting the online generation of 512x512 videos at up to 40FPS with “negligible starting latency.”

With great power comes great responsibility, and Microsoft says it understands this when it comes to VASA-1’s capabilities. The company recognizes the possibility of it being misused, but adds that “it is imperative to recognize the substantial positive potential” of the company’s technique. Microsoft lists benefits which include enhancing educational equity, improving accessibility for individuals with communication challenges, offering companionship or therapeutic support to those in need, among others. Microsoft concludes it is dedicated to developing AI responsibly, with the ultimate goal of advancing human well-being.

With all that said, the software giant says it has no plans of releasing an online demo of VASA-1, API, product, additional implementation details, or any related offerings, until it is positive that the technology will be used responsibly and in accordance with proper regulations. So… perhaps never?