Blog Listing

Computers Are Now Listening Better Than Humans

By Yina Moe-Lange  


Natural Language Processing (NLP) has made large strides in recent years, with Microsoft most recently setting a new speech recognition record. The Microsoft researchers reached a 5.1% word error rate for its conversational speech recognition system on the 2000 Switchboard evaluation set. What is significant about this error rate is that the accuracy of the program is now on par with professional human transcribers.


Last year, the same Microsoft research group reached an error rate of 5.9%. What was different this time around is the introduction of an additional model, a convolutional neural network combined with bidirectional long-short-term memory (CNN-BLSTM). The progression of convolutional neural networks to work with NLP has played a large role in these milestones being set. Additionally, the increase in GPU power and performance has played a role in the speed and effectiveness of training models.


Another important factor in this new lower error is that in this speech recognition system the researchers enabled the recognizer to use whole conversations. This is significant as it makes the machine learn to process much more like humans, who while in a conversation can predict which words or phrases might come next.


Microsoft points to their Speech Translator service as a direct result of the advances in NLP. This service can translate presentations in real time for multi-lingual audiences.


Microsoft’s Speech Translator Service. Source: Microsoft Research Blog


NLP has a long history, and one can say it formally began in 1950 with Alan Turing’s article, “Computing Machinery and Intelligence.” From this paper emerged the Turing test, where the assertion was that a computer would be intelligent when a human conversed with a computer without realizing it was a machine. Since then, there have been many attempts to improve machine translation and language processing. Most 90’s kids had a try with AOL Instant Messenger’s Chatbots. And today there are various chatbots and digital assistants that use NLP.


What’s great about this achievement is that these speech recognition abilities are used in actual Microsoft services like Cortana. NLP can be used in a wide range of applications, more than just the intelligent personal assistants we know today. NLP can be used to process unstructured text in medical records. Another key use is machine translation, which can improve access to information to people across language barriers. Moving forward, researchers will look to teach machines to understand intent and meaning in speech, which will open up a whole new frontier of uses and applications.