Jump to content

Recommended Posts

Guest ArcherZ
Posted

We are super excited to announce the Public Preview of Fast Transcription service in Azure AI Speech, which allows customers and developers to transcribe audio file to text accurately and synchronously, with a high speed factor.

 

 

 

Fast Transcription service includes our latest end-to-end model technologies, with best quality and super high Speed Factor leveraging GPU inference (it can transcribe a 30-minute audio file in less than 1 minute).

 

 

 

 

 

 

Supported Locales: en-US, zh-CN, fr-FR, it-IT, es-ES, es-MX, ja-JP, ko-KR, pt-BR, hi-IN (more coming soon, learn more about language support in Speech service)

 

Supported Regions: East US, Southeast Asia, West Europe, Central India (learn more about region support in Speech service. For some unsupported regions it may still return the transcription result correctly but with slower speed)

 

Supported Audio Formats / Codecs: WAV, MP3, OPUS/OGG, FLAC, WMA, AAC, ALAW in WAV container, MULAW in WAV container, AMR, WebM, M4A, and SPEEX.

 

 

 

[HEADING=1]Use with REST API:[/HEADING]

 

You can use Fast Transcription via Speech-to-text REST API (2024-05-15-preview or later versions)

 

The fast transcription API uses multipart/form-data to submit audio files for transcription. The API returns the transcription results synchronously.

 

Construct the request body according to the following instructions:

 

  • Set the required [iCODE]locales[/iCODE] property. This value should match the expected locale of the audio data to transcribe. To enable automatic language detection (support soon in future versions), you need to input a list of candidate languages.
  • Optionally, set the [iCODE]profanityFilterMode[/iCODE] property to specify how to handle profanity in recognition results. Accepted values are [iCODE]None[/iCODE] to disable profanity filtering, [iCODE]Masked[/iCODE] to replace profanity with asterisks, [iCODE]Removed[/iCODE] to remove all profanity from the result, or [iCODE]Tags[/iCODE] to add profanity tags. The default value is [iCODE]Masked[/iCODE].
  • Optionally, set the [iCODE]channels[/iCODE] property to specify a collection of the audio channels to be transcribed (up to two channels). In the default case, for a dual-channel audio, the channels 0 and 1 will be considered and the recognition result will be downmixed.

 

Make a multipart/form-data POST request to the endpoint with the audio file and the request body properties. The following example shows how to create a transcription using the fast transcription API.

 

  • Replace [iCODE]YourSubscriptionKey[/iCODE] with your Speech resource key.
  • Replace [iCODE]YourServiceRegion[/iCODE] with your Speech resource region.
  • Replace [iCODE]YourAudioFile[/iCODE] with the path to your audio file.
  • Set the form definition properties as previously described.

 

 

 

 

 

curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-05-15-preview' \
--header 'Content-Type: multipart/form-data' \
--header 'Accept: application/json' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{\"locales\":[\"en-US\"], \"profanityFilterMode\": \"Masked\", \"channels\": [0,1]}"'

 

 

 

 

 

 

 

 

 

[HEADING=1]Try Out in AI Studio:[/HEADING]

 

AI Studio -> AI Services -> Speech -> Fast Transcription

 

largevv2px999.jpg.24c14c4d281330538f90237e78475dd2.jpg

 

 

 

[HEADING=1]Use in your scenarios:[/HEADING]

 

Fast Transcription is a perfect fit for many audio file input scenarios like Copilots, audio/video caption and edit, video translation, post-call analytics, etc.

 

Learn more about the practice of how ClipChamp is using for auto captioning, and how OPPO is using in their AI phones.

 

 

 

[HEADING=1]Useful resources:[/HEADING]

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...