Introducing super realistic AI voices optimized for conversations

  • Thread starter Thread starter Melinda Ma
  • Start date Start date
M

Melinda Ma

Now, in human-bot conversational interactions, AI can produce more natural, fluent, and high-quality responses than ever before, thanks to the power of Large Language Models (LLMs) such as Azure OpenAI GPT. Consequently, when engaging in verbal conversations, the demand for naturalness and expressiveness in Text-to-Speech (TTS) voices is higher than ever. We are introducing these new voices specifically designed for conversational scenarios. Whether you are creating a speech-based chatbot, a voice assistant, or a conversational agent, these new voices will ensure your interactions are more realistic, lifelike, and engaging.



The new realistic voices are perfect matches for any application necessitating lifelike speech interactions, including chatbots, voice assistants, gaming, e-learning, entertainment, and more.



Meet four new voices we introduce today: en-US-AndrewNeural, en-US-BrianNeural, en-US-EmmaNerual and zh-CN-YunjieNeural, all optimized for conversational scenarios, available in public preview in three regions: East US, South East Asia and West Europe.



Check out the voice samples




Demo of new voices in comparison with other voices




Hear how these voices sound in conversations, compared to other voices in the stock that are designed for more general purposes.




Script

New voices optimized for conversations

Existing voices designed for general purpose

I can help you with a lot of things! I can answer questions, provide information on a wide range of topics, help you find things on the web, and more. If you have a specific question or task in mind, feel free to ask me and I'll do my best to assist you.

Emma

[TD]
Jenny[/TD]


[TR]
[TD]
I'm not sure what you're asking. If you're asking for a paraphrase of the sentence "I learn about myself that I can lead a team", then it means that the speaker has discovered that they have the ability to lead a team. Is there anything else I can help you with?
[/TD]

[TD]
Andrew[/TD]


[/TR]
[TD]
Guy[/TD]


[TR]
[TD]
风筝有风,海豚有海 ,而您有我,感谢您的光临。么么哒!
[/TD]

[TD]
Yunjie[/TD]


[/TR]
[TD]
Yunxi[/TD]





More samples


Script

New voice

I understand. It sounds like a place that is both impressive and terrifying. I wonder what kind of tea they serve there. Is it made from the sun's rays or from something else? And who are the people who live there? Are they loyal to the Empire or do they have their own agendas?

Emma

Yes, that is what I said. A maximin strategy is the one that maximizes the minimum payoff of a player, regardless of what the other players do. It is a way of ensuring that the player gets at least a certain amount of payoff, even in the worst case scenario.

Andrew
[TR]
[TD]
If you can't find the information, you may want to consider contacting your state's insurance department. They may be able to help you locate any life insurance policies that were taken out on your husband. I hope this helps. Please let me know if you have any other questions.
[/TD]

[TD]
Brian[/TD]


[/TR]
[TR]
[TD]
好的,让我为您创建一个新的理赔单。请稍等。我已经为您创建了一个新的理赔单。我们会联系您安排修理您的车子。我们还会通过电子邮件给您发送一个链接,以便您可以上传您拍摄的照片。还有什么其他我可以帮助您的吗?
[/TD]

[TD]
Yunjie[/TD]


[/TR]


Demo of full conversation





Conversations between Andrew and Emma (in English):




Conversations between Yunjie and Xiaochen (in Chinese):






Integrate these new voices with Azure OpenAI




You can effortlessly incorporate these new neural Text-to-Speech (TTS) voices into your applications using the Azure Speech SDK or REST API. Additionally, you can employ the Azure Bot Framework to develop intelligent bots capable of utilizing these new neural TTS voices for speech synthesis.

To minimize latency during the integration of Large Language Models (LLMs) and TTS, it is advised to send text to the TTS service while the LLM is still generating a response. You can find a demo sample here that demonstrates generating TTS responses in a streaming manner.



Technology behind




We began by crafting the persona of each voice as if it were a real person who is friendly and optimistic about life, always eager to assist others and share intriguing or practical knowledge. The speaking style of the voice resembles a conversation with an acquaintance over a cup of tea, maintaining a natural and unexaggerated tone.

Furthermore, we continuously enhance our Text-to-Speech (TTS) modeling techniques to improve the quality of our AI voices. Our most recent projects, such as DelightfulTTS 2, and MuLanTTS, have significantly narrowed the quality gap between AI voices and professional human recordings, producing more natural and realistic voices than ever before. These technological advancements serve as the foundation upon which these new AI voices are built.



Get started




Microsoft offers over 400 neural voices covering more than 140 languages and locales. With these Text-to-Speech voices, you can quickly add read-aloud functionality for a more accessible app design or give a voice to chatbots to provide a richer conversational experience to your users. In addition, with the Custom Neural Voice capability, you can easily create a brand voice for your business.



For more information


Continue reading...
 
Back
Top