Once global data started to grow exponentially a decade ago, it has shown no slowing down. It's aggregated mainly via the internet, including social networks, web search requests, text messages, and media files. The world is powered by big data, now forcing companies to seek experts in big data consulting, capable of harnessing complex data processing. Magic Data Technology is one of the leading one-stop AI data service providers in the world. We are committed to providing a wide range of data services in the fields of automatic speech recognition (ASR), text to speech (TTS), computer vision recognition, and Natural Language Processing (NLP). We aim to have the most appropriate solutions to address our clients' needs with precision through a systematic process. The business has represented the world's most massive AI and Fortune 500 companies and has enjoyed a strong reputation.

MagicData Technology was established in 2016. Magic Data has quickly grown into one of the foremost companies in the artificial intelligence industry through the organization's higherexpertise and higher-precision data services. "We strive to provide the most efficient and highest quality one-stop data services for customers in the fields of speech recognition, intelligent imaging and Natural Language Understanding (NLU). Our services include data scheme design, data collection, data annotation/ transcription, etc." says Zhang Qingqing, CEO, MagicData.

MagicData Mandarin Chinese Read Speech Corpus was developed by MagicData Technology and freely published for non-commercial use. The corpus consists of 755 hours of scripted read speech data by 1000 native speakers of the Mandarin Chinese spoken in mainland China. 

We strive to provide the most efficient and highest quality onestop data services for customers in the fields of speech recognition, intelligent imaging and Natural Language Understanding (NLU). Our services include data scheme design, data collection, data annotation/transcription, etc.

MagicData Technology developed Japanese Read Speech Recognition Corpus with a significant volume of 1500 hours. A subset of 30-hour scripted read speech data was developed and freely published for non-commercial use. Thirty-seven native speakers are from different areas, including Tokyo, Osaka, Hokkaido, etc. The corpus is a test set, recorded indoors, and the output is PCM formatted. The recording texts are from daily conversation.

MagicData Kid Voice TTS Corpus was recorded by a fouryear-old Chinese girl born initially in Beijing, China. This time we published 15-minute speech data from the corpus for noncommercial use.

The organization helps build a higher quality speech recognition system with a more accurate dataset. These datasets can be widely used in various model training such as voice assistant, smart home, call center, and car infotainment systems. 

MagicData Mandarin Chinese Read Speech Corpus was developed by MagicData Technology and freely published for non-commercial use. The contents and the corpus' corresponding descriptions include: The corpus contains 755 hours of speech data, mostly mobile recorded data. One thousand eighty speakers from different accent areas in China are invited to participate in the recording. The sentence transcription accuracy is higher than 98%. Recordings are conducted in a quiet indoor environment. The database is divided into a training set, validation set, and testing set in a ratio of 51: 1: 2. Detail information such as speech data coding and speaker information is preserved in the metadata file. The domain of recording texts is diversified, including interactive Q&A, music search, SNS messages, home command, and control.