Text to Speech Avatar in Azure AI is now generally available

  • Thread starter Thread starter QinyingLiao
  • Start date Start date
Q

QinyingLiao

Today, we are excited to announce that Text to Speech (TTS) Avatar, a capability of Azure AI Speech service, is now generally available for developers, enterprises and content creators.



This service brings natural-sounding voices and photorealistic avatars to life, enhancing customer engagement and overall experience. With TTS Avatar, developers can create personalized and engaging experiences for their customers and employees, while also improving efficiency and providing innovative solutions.

The TTS Avatar service provides developers with a variety of pre-built avatars, featuring a diverse portfolio of natural-sounding voices and an option to create custom synthetic voices using Azure Custom Neural Voice. Additionally, the photorealistic avatars can be customized to match a company's branding. Developers can use TTS Avatar to generate speech and avatars in real-time or through a batch mode, depending on the needs of their applications.


Prioritizing responsible AI is fundamental to our Text to Speech Avatar capability. We develop it to adhere to our responsible AI principles and offer Custom Avatar as a limited access service with only a select number of use cases approved through a controlled application and review process. Scroll to the end of this blog to learn more about approach to responsible AI for TTS Avatar.



Selected use cases and customers​




Let's take a closer look at some of the key use cases for TTS Avatar:



Customer service​


Chatbots are a popular way for businesses to provide 24/7 customer service. Azure TTS Avatar could help enhance customer experience by providing a more personalized and engaging interaction. An avatar can answer customer questions, provide troubleshooting assistance, and even help customers complete transactions. This improves customer satisfaction and reduces the workload on customer service agents.



With the general availability of TTS Avatar, we are closely collaborating with customers and partners around the world to develop engaging customer service solutions for a variety of industries.



KPMG, a multinational professional services network, is leveraging TTS Avatar to create personalized and engaging customer service solutions for their customers.

"By utilizing Microsoft Azure’s TTS Avatar service with Custom Neural Voice, businesses can create personalized and engaging experiences for their customers and employees, while also improving efficiency and providing innovative solutions, as well as reducing costs in certain customer service areas," says Sina Steidl-Küster, Managing Partner of KPMG Germany/Region Southwest.​



Fujifilm is incorporating TTS Avatar with NURA, the world’s first AI-powered health screening center.

Embracing the Azure TTS Avatar at NURA as our 24-hour AI assistant marks a pivotal step in healthcare innovation. At NURA, we envision a future where AI-powered assistants redefine customer interactions, brand management, and healthcare delivery. Working with Microsoft, we're honored to pioneer the next generation of digital experiences, revolutionizing how businesses connect with customers and elevate brand experiences, paving the way for a new era of personalized care and engagement. Let's bring more smiles together," says Dr. Kasim, Executive Director and COO, Nura AI Health Screening, Fujifilm.

MAPFRE, an insurance company in Spain, is using Azure TTS Avatar to generate videos that improve communication and efficiency, drive innovation, and optimize processes.

"In MAPFRE, we have assessed Microsoft’s Avatar service, and it has demonstrated great value to us because of its ability to enhance the user experience and promote collaboration. Additionally, its use can drive innovation and optimize processes, adding significant value to our organization," says Ubaldo Gonzalez, Chief Data Officer MAPFRE Spain.​



Dentsu Digital, a comprehensive digital marketing company, is using Azure TTS Avatar to generate lifelike voices and avatars to enhance the overall customer experience and promoting collaboration.

"New challenges invariably demand bold approaches. We are deeply honored to collaborate with Microsoft, leveraging their cutting-edge technology and expertise as we aim to implement this vision into society and usher in a new era," says Tomohiko Sugiura, Executive Vice President, Dentsu Digital Inc.​


Bank SinoPac is enable their chatbot to talk to and interact with customers using TTS Avatar in their Kiosks.

"Azure’s TTS Avatar technology has sparked great expectations for lifelike agents. With the imminent arrival of AGI second level and continuous evolution, I am confident that there will be more diverse and innovative applications for financial services and efficiency improvement," says Coolson Shen, Chief Information Officer of Bank SinoPac.​



Herbalife is working with Microsoft to build real-time chatbots for their products.

“Herbalife has always been committed to finding innovation solutions to elevate well-being. Partnering with Microsoft propels us into the future and connects our global community like never before. With AI avatars that leverage Text-to-Speech and custom neural voice pro technology, we have more agility to answer inquiries, offer wellness tips and provide advice to empower our consumers to live their best lives.”says Monica Kedzierski, VP Global Data, Analytics & AI, Herbalife.​



Lokeshwar R Vangala, Senior Director of Engineering, Data & AI at Coca Cola, aptly stated, “Plain vanilla chatbots are a relic of the past. Enter the new era with virtual avatars and influencers! Microsoft's virtual avatar with custom neural voice (CNV) revolutionizes customer support and marketing, offering lifelike interactions that engage users like never before. These avatars enhance user experience, provide personalized assistance, and boost brand loyalty. In the competitive GenAI arena, Microsoft’s scalable technology is the key to staying ahead and delivering unmatched value.”



E-commerce​


Avatars are also being used in e-commerce to offer a more personalized and engaging shopping experience. Videos represent a powerful means for businesses to engage with their customers. Streaming commerce, a fresh approach to shopping, involves live streaming videos of products and services. This allows customers to engage with the host and make real-time purchases.



As an example, Microsoft Store on JD.com is leveraging avatars to enhance the streaming commerce experience. During live streaming events, a lifelike avatar could interact with customers in real-time, providing product information and answering customer questions. The avatar could also assist with the purchasing process, making it easy for customers to complete their transactions without leaving the streaming platform. With TTS Avatar, Microsoft Store on JD.com was able to drive sales and increase customer engagement, while also promoting collaboration and trust between the customer and the brand.



QinyingLiao_1-1724244895158.png



Content consumption​


TTS Avatar significantly enhances content consumption by converting text into natural, human-like speech, making content accessible and convenient. The avatar's visual element increases engagement through human-like emotions, while its customization capabilities offer personalized user experiences, fostering greater satisfaction and loyalty. Additionally, by supporting multiple languages, TTS Avatar breaks language barriers, making content more inclusive and accessible to a broader audience.



Mediapro, a leading group in the European audiovisual sector, unique in content integration, production and audiovisual distribution, is working with Microsoft to innovate their digital communications. “We have created AIMar, an avatar based on MSFT technology purposefully designed for the Communications department. AIMar mimics a real Communications professional and enables generating communication messages and campaigns at any time, in any language,” says Mayte Hidalgo, Head of AI Center of Excellence of Grup Mediapro.



TTS Avatar with GPT-4o​



It’s easy to get started with TTS avatars for video creation using batch synthesis and live chats using real-time synthesis with Azure OpenAI Service GPT-4o integrated.



Developers can take advantage of Azure TTS Avatar’s API and SDKs to integrate the service into their applications. The API and SDKs provide a simple and easy-to-use interface for generating speech and avatars, making it easy for developers to incorporate Azure TTS Avatar into their workflows. Check out the documentation on live-chat synthesis avatar and batch synthesis avatar.



We also provide sample code to aid in integrating the text-to-speech avatar with the GPT-4o model. Learn more about how to create lifelike chatbots with real-time avatars and Azure OpenAI Service, or dive into code samples here (JS code sample, and python code sample). For guidance on creating a live chat app using Azure OpenAI Service On Your Data, please refer to this sample code (search "On Your Data").



Here is a demo of TTS live chat avatar integrated with GPT-4o.




For regional availability of the TTS Avatar capability, learn more here.



Responsible AI considerations​




Microsoft​


Microsoft believes that when you create technologies that can change the world, you must also ensure that the technology is used responsibly. Our goal is to develop and deploy AI that will have a beneficial impact and earn trust from society. Our work is guided by a core set of principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. We take a cross-company approach through cutting-edge research, best-of-breed engineering systems, and excellence in policy and governance.



Microsoft is committed to helping our customers use our AI products responsibly, sharing our learnings, and building trust-based partnerships through tools like Transparency Notes and Impact Assessments. Many of these resources can be found at Empowering responsible AI practices | Microsoft AI.



Text to Speech service​


As part of this commitment, we have integrated safety and security features and guidelines into Azure TTS Avatar. This includes measures to promote transparency in user interactions, mechanisms to identify and mitigate potential bias or harmful synthetic content, among other features.



In this transparency note, we describe the technology and capabilities for TTS Avatar, its approved use cases, considerations when choosing use cases, its limitations, fairness considerations and best practice for improving system performance.



We require all developers and content creators to adhere to our code of conduct when using avatar features including prebuilt and custom avatars.



To ensure the responsible use of the technology, we have limited access to the custom avatar features. Custom avatars are available by registration only, and only for certain use cases. To access the feature, follow the limited access instructions to register your use case. Besides the limited access, it is required that you obtain explicit permission from the avatar talent prior to creating an avatar model that resembles the actor’s appearance. We require every customer to upload a recorded video file with a pre-defined statement from the avatar talent acknowledging that the customer will use the talent’s image and voice to create a TTS avatar.



Content Safety and Watermark​


Azure AI Content Safety is integrated into the batch synthesis process of text to speech avatars for video creation scenarios. This added layer of text moderation allows for the detection of offensive, risky, or undesirable text input, thereby preventing the avatar from producing harmful output. The text moderation feature spans multiple categories, including sexual, violent, hate, self-harm content, and more. It's available for batch synthesis of text-to-speech avatars both in Speech Studio and via the batch synthesis API.



To provide clearer insights into the source and history of video content created by text to speech avatars, we've adopted the Coalition for Content Provenance and Authenticity (C2PA) Standard. This standard offers transparent information about AI-generation of video content. For more details on the integration of C2PA with text to speech avatars, refer to Content Credentials in Azure Text to Speech Avatar .



Additionally, invisible watermarks are added to avatar outputs. These watermarks allow approved users to identify whether a video is synthesized using Azure AI Speech’s avatar feature. Eligible customers can use Azure AI Speech avatar watermark detection capabilities. To request watermark detection on a given video, please contact avatarvoice[at]microsoft.com.



Microsoft Azure​


TTS Avatar is built on Microsoft Azure, a secure and compliant cloud infrastructure. Learn more about how your data will be processed and protected here.



Get started​




Azure TTS Avatar is a powerful tool for developers looking to enhance customer engagement and improve overall experience. With a variety of use cases and customer references, it's clear that Azure TTS Avatar is paving the way for a new era of customer engagement and innovation. As developers, you can use Azure TTS Avatar to create personalized and engaging experiences for your customers and employees with a rich choice of prebuilt avatars and voices available. You can also leverage Custom Avatar and Custom Neural Voice to create custom synthetic voices and images that sound like your brand. With responsible AI features that promote transparency and fairness, Azure TTS Avatar helps you create inclusive and ethical applications that serve a diverse range of users.



Learn more:

Try our TTS voice demo

Create a video using prebuilt avatars

Try our live chat demo with prebuilt avatars

Apply for access to Custom Avatar and Custom Neural Voice

Continue reading...
 
Back
Top