Microsoft releases largest publicly available speech data for three Indian languages to aid researchers

2 min. read

Published onSeptember 6, 2018

published onSeptember 6, 2018

Share this article

Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more

Microsoft India has announced the availability ofMicrosoft Indian language Speech Corpusto help researchers and academia build Indian language speech recognition for all applications where speech is used.

Available for Telugu, Tamil, and Gujarati, this is the largest publicly available Indian language speech dataset and includes audio and corresponding transcripts.

The Speech Corpus content is provided as part of Microsoft Research Open Data initiative, a collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences.

Microsoft Indian Language Speech Corpus was launched to address the scarcity of adequate digital data for text, speech, and linguistic resources – which are imperative in building large machine learning models for many vernacular languages across the world. The development of accurate digital tools in Indian languages has been slow owing to subtle differences in enunciation, accent, diction, and slang across various regions in India.

Microsoft’s Indian Language Speech Corpus was tested at Interspeech 2018, the world’s largest and most comprehensive conference on the science and technology of spoken language processing, and was used to create high-quality speech recognition models, thus validating the efficacy of the Corpus.

It is imperative that India’s increasing digital literacy is supported by a multi-lingual digital world and initiatives like these for researchers and academia will help accelerate innovation in voice-based computing for India.

Radu Tyrsina

Radu Tyrsina has been a Windows fan ever since he got his first PC, a Pentium III (a monster at that time).

For most of the kids of his age, the Internet was an amazing way to play and communicate with others, but he was deeply impressed by the flow of information and how easily you can find anything on the web.

Prior to founding Windows Report, this particular curiosity about digital content enabled him to grow a number of sites that helped hundreds of millions reach faster the answer they’re looking for.

User forum

0 messages

Sort by:LatestOldestMost Votes

Comment*

Name*

Email*

Commenting as.Not you?

Save information for future comments

Comment

Radu Tyrsina

With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low#

Copilot in Outlook will generate personalized themes for you to customize the app#

Microsoft will raise the price of its 365 Suite to include AI capabilities#

Death Stranding Director’s Cut is now Xbox X|S at a huge discount#

Outlook will let users create custom account icons so they can tell their accounts apart easier#

Microsoft releases largest publicly available speech data for three Indian languages to aid researchers#

With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low

Copilot in Outlook will generate personalized themes for you to customize the app

Microsoft will raise the price of its 365 Suite to include AI capabilities

Death Stranding Director’s Cut is now Xbox X|S at a huge discount

Outlook will let users create custom account icons so they can tell their accounts apart easier

Microsoft releases largest publicly available speech data for three Indian languages to aid researchers