Voice control: a megatrend | Future Markets Magazine

The quality of voice recognition systems is steadily increasing thanks to developments in AI and semiconductor and microphone technology. Voice assistants are gaining in importance and popularity beyond the smart home environment and are increasingly being used for more sophisticated applications, such as voice-based device control in cars and industry.

“Alexa, turn on the living room light!” is just one of the many commands that smart speakers can execute today. Smart speakers and their voice recognition systems are already the central control point for many smart home functions in many households. Statista reports that Amazon’s Alexa alone can be used to control over 60,000 different smart home devices.

Use of Voice Assistants is Increasing

Human Machine Interfaces that function via voice are no longer just a vision from science fiction series like Star Trek or Knight Rider. In the latter, the hero of the series regularly had humorous conversations with his car K.I.T.T. In fact, cars have made the biggest leap in the use of voice-controlled HMIs: for example, according to the industry association Bitkom, almost half of users in Germany already give voice commands to their cars – be it to set the navigation system on course, to start a playlist, or to have messages read out. “Automobile manufacturers have massively expanded voice control in vehicles in recent years,” says Dr Sebastian Klöß, an expert for consumer technology at Bitkom. “Voice control not only increases comfort at the wheel but also makes driving safer. Voice assistants will establish themselves as the dominant way to operate the vehicle’s functions on the move.”

Better than a Human

Research into voice recognition systems has been ongoing since the 1950s. The first systems could identify just a single voice and barely a dozen words. It was not until the 2000s that technology advanced enough to make virtual assistants like Google Home or Amazon Alexa possible. Since then, HMIs with voice control have significantly improved – today’s systems recognise words better than a human, achieving a “Word Error Rate” of three to four percent. Humans, on the other hand, typically do not understand around five percent of words

Increasing Accuracy Thanks to AI

The high accuracy of voice recognition has been greatly improved through the use of artificial intelligence. Machine learning algorithms, such as deep learning, are used to recognise complex speech patterns, understand natural language and differentiate between different languages.

Fast Reaction Thanks to Edge-Processing

Besides accuracy, the speed at which speech is converted into computer-readable commands is crucial – especially when time-critical functions need to be controlled. However, as the amount of data to be processed in voice recognition is enormous, the necessary algorithms for most virtual assistants run in the cloud, or rather in a data centre. This, however, is associated with relatively high latency – the time between issuing the command and its execution. But thanks to immense advances in semiconductor technology, special AI and digital signal processors are now available that can process speech directly on-site. The reaction times are correspondingly low, as the data no longer need to be uploaded to the cloud. Dedicated audio-edge processors further increase the efficiency in voice-controlled devices: they act as an energy-efficient wake-up switch that only turns on the power-hungry application processor when a specific keyword is mentioned. It can also take over the task of noise suppression, thereby alleviating the main processor of this responsibility.

Increasingly Powerful Microphones

In addition to digital signal processors, microphone technology is crucial for the accuracy of voice recognition. Microphone arrays, for example, enable the voice recognition system to focus on the user and filter out background noise. This beamforming technology is already used in smart speakers like the Home Pod and Echo. Increasingly, MEMS microphones – miniaturised micro-electromechanical systems that are mounted directly on electronic boards – are being used. They are characterised by a high signal-to-noise ratio, low power consumption and high sensitivity. Miniaturisation enables several microphones to be combined in a small space, which is a prerequisite for beamforming, noise suppression and wind noise filtering.

Hands-Free in Production

With advancements in hardware and voice recognition algorithms, voice control is now becoming a viable option for application areas that were deemed impossible just a few years ago, such as industrial production, which is characterised by high levels of ambient noise. The Fraunhofer IDMT in Oldenburg has developed a solution in which ambient noise is almost completely filtered out through a combination of directional microphones and effective noise-cancelling. “For the first time ever, we can use voice command technology to control production machines in a robust and intuitive manner,” says Marvin Norda, project manager for Voice Controlled Production at Fraunhofer IDMT. “For manufacturing companies, this means improved efficiency and lower costs.” In the future, machine operators will have both hands free. They can position a workpiece in the work area and simultaneously give a robot instructions, such as “lower arm” or “grip workpiece”.