Alibaba's new Qwen models can clone voices from three seconds of audio

2025-12-24

Summary

Alibaba has introduced two new AI models, Qwen3-TTS-VD-Flash and Qwen3-TTS-VC-Flash, that can create or clone voices using text commands and short audio clips. The first model allows users to generate voices with detailed characteristics, while the second can replicate voices from just three seconds of audio in ten different languages, reportedly with high accuracy.

Why This Matters

This development is significant as it enhances the capabilities of voice synthesis technology, offering more precise and versatile voice generation tools. Such advancements could impact industries relying on voice technology, like customer service, entertainment, and content creation, by providing more customizable and realistic voice interactions.

How You Can Use This Info

Professionals in marketing or media can leverage these models to create engaging audio content tailored to specific audiences. Customer service departments might use voice cloning for personalized interactions, enhancing user experience. Additionally, content creators can explore new creative possibilities by using these models to generate diverse and dynamic audio narratives. You can explore demos of these models on Hugging Face.

Read the full article