Microsoft is taking speech recognition where no man has gone before. Rick Rashid, Microsoft's Chief Research Officer, recently offered a demonstration in Tianjin, China at the Microsoft Research Asia's 21st Century Computing event.
During his demonstration, Rashid showcased the latest results of a breakthrough in collaborative research between Microsoft and the University of Toronto--reducing the error rate for speech by more than 30 percent compared to older methods. That, explained Rashid, means that instead of having one word in every four or five incorrect, the error rate is only one wrong word in every seven or eight.
"While still far from perfect, this is the most dramatic change in accuracy since the introduction of hidden Markov modeling in 1979," Rashid wrote in a blog post, "and as we add more data to the training we believe that we will get even better results."
English to Chinese in Your Own Voice
Rashid's presentation also demonstrated how Microsoft takes the text that represents his speech and runs it through translation. Specifically, he turned his English into Chinese in two steps. The system first took his words and found the Chinese equivalents, which he called the hard part. The second step reorders the words to be appropriate for Chinese.
"Of course, there are still likely to be errors in both the English text and the translation into Chinese, and the results can sometimes be humorous," Rashid said. "Still, the technology has developed to be quite useful."
Rashid magnified Microsoft's achievement of enabling an English speaker to present in Chinese in his or her own voice. That feat required a text-to-speech system that Microsoft researchers built using a few hours of speech from a native Chinese speaker and properties of his voice taken from an hour of pre-recorded English speeches.
Star Trek-Like Capabilities
"Though it was a limited test, the effect was dramatic, and the audience came alive in response," Rashid said. "When I spoke in English, the system automatically combined all the underlying technologies to deliver a robust speech to speech experience -- my voice speaking Chinese."
Rashid admits that the results are still not perfect and that there is still much work to be done, but he and others hope that in a few years Microsoft will have systems that can completely break down language barriers.
"In other words, we may not have to wait until the 22nd century for a usable equivalent of Star Trek's universal translator, and we can also hope that as barriers to understanding language are removed, barriers to understanding each other might also be removed," Rashid said. "The cheers from the crowd of 2000 mostly Chinese students, and the commentary that's grown on China's social media forums ever since, suggests a growing community of budding computer scientists who feel the same way."
We turned to Rob Enderle, principal analyst at The Enderle Group, about the innovative research. He told us real-time translation has been a struggle because it requires a lot of processing power. But Microsoft is making strong strides.
"With real-time translation, suddenly you can now go places and ask questions and the other person will understand you," Enderle said. "Sentence structure is often very different and so translation can be very difficult as well. The system has to listen to the entire series of words before it can provide an accurate translation. That requires a substantial amount of processing power. But we're closer now than many of us thought."