Recent progress in AI voice cloning technologies demonstrates notable improvements in quality and accessibility. Recently OpenAI and Google have published some amazing models that are pushing even further the limits of voice synthesis. For example, OpenAI's upcoming GPT-4 model with voice capabilities yields very natural sounding outputs that approach human speaking levels of behavior and need only 5 mins recorded audio clip to accurately vectorize. The articles claim that up to 95% of accuracy can be reached with these models, which in return should make them effective for the purposes we are gonna use it.
The Text-to-Speech (TTS) API through Google is the most recent, using WaveNet technology to provide more natural-sounding voices. Using artificial neural networks, the speech generated by this technology looks and sounds much more like what we expect from a human in timing and intonation. Dynamic pricing has dropped the price of top quality voice synthesis to around $4 for each million figures employing WaveNet's State-of-the-art capabilities.
This is also reflected in the open-source space, with the likes of Mozilla TTS and Coqui project initially raising eyebrows owing to their easy-to-use but inexpensive solutions. By Offering a platform to generate voice clones at very low cost on Mozilla TTS Or by offering Coqui models which are highly configurable and can undergo extensive modifications and improvements. For example, in user evaluations The Coqui TTS model has been rated at 4.5 out of 10 quality score on synthetic speech which confirms it is one of also a few great models for mimicking human voices properties.
Going a significant step further than fake voices, Descript has also built Overdub for allowing creators to use voice unlocking as part of its broader suite of audio editing capabilities. And that is voice cloning, Descript has this very unique feature where one can clone their own voices with just a simple setup and have it directly integrated into your projects. Those voice cloning ability charges are similar to subscription-based apps rapidly becoming common in the industry (though Descript is free for simple transcription software, with basic synthesis tools).
As Janelle Shane, a AI researcher says: “Today's most advanced AI voice cloning can produce near-human quality audio recordings with little training data.” These developments highlight the powerful transformation of voice technology into one that is more widespread and flexible than ever in history.
To learn more about the recent advancements in ai voice cloning technology, check out: ai voice cloning