Better speech recognition thanks to Deep Learning

Digital assistants are becoming­ increasingly sophisticated at recognising speech thanks to deep-learning methods. And ­owing to their AI ability, they ­are even capable of predicting ­what their users want.

“Tea, Earl Grey, hot” – every Star Trek fan is familiar with the words Captain Picard uses to order his favourite drink from the replicator. Use of speech to control computers and spaceships is an unwavering element of most science-fiction films. Attempts have been made for many years to control machines through speech: the first speech-recognition software for computers, generally speaking, was presented to the public by IBM in 1984. Some ten years later, it was developed for the PC and thus for the mass market. Meanwhile, Microsoft used speech recognition in an operating system for the first time in 2007 with Windows Vista.

Apple was responsible for the breakthrough on the mass market in 2011, when it launched its speech-recognition-software assistant Siri for the iPhone 4s. Siri now shares the market with a number of similar solutions: Amazon’s Alexa, Cortana from Microsoft or Google’s Assistant. Common to all systems is that the speech input is not processed locally on the mobile device, rather on servers at the company: the voice message is sent to a data centre and converted there from spoken to written language. This allows the actual assistant system to recognise commands and questions and respond accordingly. An answer is generated and sent back locally to the mobile device – sometimes as a data record and sometimes as a finished sound file. With fast mobile Internet connections needed for this purpose, speech recognition is therefore benefiting from the current trend towards cloud computing and faster mobile Internet connections.

The error rate of speech-recognition systems has decreased significantly from 27% in 1997 to only about 6% in 2016!

 

Enhanced-quality speech recognition thanks to Deep Learning and Artificial Intelligence

Speech-recognition systems have benefited primarily in recent times from Artificial Intelligence. Self-learning algorithms ensure that machine understanding of speech is improving all the time: the error rate with computer-based speech recognition fell according to a study by McKinsey in 2017 from 27 per cent in 1997 to 6 per cent in 2016. Thanks to deep learning, the systems are getting increasingly better at recognising and learning the speaking patterns, dialects and accents of users.

Nuance – whose voice technology is incidentally behind Apple’s Siri – was also able to increase the precision of its Dragon speech-recognition solution, which it launched in 2017, by up to 10 per cent in comparison with the predecessor version. The software consistently uses deep learning and neural networks in this regard: on one hand at the speech model level, where the frequency of words and their typical combinations are recorded. And, on the other hand, also at the level of the acoustic model, where the phonemes or smallest spoken units of speech are modelled. “Deep-learning methods normally require access to a comprehensive range of data and complex hardware in the data centre in order to train the neural networks,” explains Nils Lenke, Senior Director Corporate Research at Nuance Communications. “At Nuance, however, we managed to bring this training directly to the Mac. Dragon uses the specific speech data of the user and is therefore continuously learning. This allows us to increase the precision significantly.

Predictive assistants

AI not only improves speech recognition, however, but also the quality of the services offered by digital assistants such as Alexa, Siri and others. The reason for this is that the systems can deal with topics predictively based on their learning ability and make recommendations. Microsoft’s Cortana uses a notebook – like a human assistant – for this purpose, in which it notes down the interests and preferences of the user, frequently visited locations or rest periods when the user prefers not to be disturbed. For example, if the user asks about weather and traffic conditions every day before leaving for work, the system can offer the information independently after several iterations without the user needing to ask actively.

Voice control of IoT devices

The digital assistants become especially exciting when they are networked with the Internet of Things, since they can be used to control a whole host of different electronic equipment. Digital assistants will already be supported by more than 5 billion devices in the consumer sector in 2018, according to IHS Markit market researchers, with a further 3 billion devices to be added by 2021. Even today, for example, the smart home can be operated by voice commands using digital assistants.

In the US, Ford has also been integrating the Alexa voice assistant into its vehicles since the start of 2017 – thus incorporating the Amazon App into the car for the first time. Drivers can therefore enjoy audio books at the wheel, shop in the Amazon universe, search for local destinations, transfer these directly to the navigation system, and much more. “Ford and Amazon share the vision that everyone should be able to access and operate their favourite mobile devices and services using their own voice,” explains Don Butler, Executive Director of Ford Connected Vehicle and Services. “Soon our customers will be able to start their cars from home and operate their connected homes when on the go – we are thus making their lives easier step by step.”

And something else that is sure to please Star Trek fans: thanks to Alexa, you can now also order your hot drink with a voice command. Coffee supplier Tchibo has launched a capsule machine onto the market, for example, which can be connected to Alexa via WLAN. As a result you can order your morning coffee from the comfort of your bed: “Coffee, pronto!

Related Posts

  • What began in the 1950s with a conference has grown into a ­key technology. It is already influencing our lives today, and…

  • Cognitive computer assistants are helping clinicians to make diagnostic and therapeutic ­decisions. They evaluate medical­ data much faster, while ­delivering at least…

  • From the graphics processing unit through neuromorphic chips to the quantum computer – ­­the development of Artificial Intelligence chips is supporting many…