Voice Application Opportunities for the Audio Industry

July 20 2017, 14:00

In previous editorials, I have highlighted the important role of smart speakers in promoting the user case for voice personal assistants (VPAs) and voice recognition applications - given the ideal combination of far-field microphone arrays with a complete audio ecosystem that encompasses digital signal processing, acoustic echo cancelation, noise cancellation, and many other technologies. This opens significant opportunities for the audio industry, given we understand the challenges.

Those in truth make total sense when integrated into a speaker, first because those technologies can be used to enhance music listening, making the investment in the object itself easier to justify for the consumer less familiarized with voice assistants. Second, for a virtual assistant to talk back to us, supply answers to our questions, or simply acknowledge our commands, we actually need a loudspeaker connected to an audio amplification system.

Of course, once users start to recognize the use of voice commands, using voice as an interface, or frequently engaging with virtual assistants, the market will be ready to evolve to integrate those functions in other devices - eventually becoming ubiquitous in any connected IoT device. This opens an opportunity to create a single virtual assistant for the smart home.

What is important in this concept is that users can successfully interact with the VPA, no matter where they are in the home. Meaning, if you are away from the smart speaker, but are close to the refrigerator, or washing machine, or TV set, any of those devices will be your interface and can listen to your voice requests/commands. But, to reply and interact with the user, all those other classes of devices will need a decent audio system, similar or even more sophisticated than what we have in current smart speakers - sometimes posing significant challenges in terms of environmental conditions.

Simply applying more voice command interactions to any appliance will not make sense to users. First, you don't want to be shouting your orders to every single appliance in the home much less want all those devices to interpret your commands wrong when they are all listening at the same time. Second, as attractive as it might seem to embed voice command interfaces in every gadget, a limited functionality is not exactly what sells the concept - users will expect the same level of consistency they get from a VPA, which will always require connectivity. If the user is forced to be close to a device to use voice commands, then he probably can also press a button or make a gesture. The whole "smart" concept is based on the notion that you can remotely control things from a distance, and that those "things" will gain use cases that are not currently available.

*ReSpeaker successfully crowdfunded its open modular voice extension interface that enables interaction with home appliances, Internet-equipped devices, and any other things.*

We all remember the pains of the very first voice-command solutions, including smart TVs... The user experience was bad - and even when it worked, it was essentially futile. You needed a room in almost complete silence and most of the time something wrong would happen - including the TV changing channels when you were watching your favorite sports game. And what was the point of shouting at your TV when you were already in front of it, remote in hand?

Many of the companies involved in voice recognition and some of the most sophisticated technologies currently available were involved in those terrible experiences. But lessons were learned. Implementing simple voice-commands in appliances is not the solution - as "clever" as it all might sound to the marketing people who just want to sell more coffee machines or expensive refrigerators.

For the smart home concept to be successful, you need fully connected systems with a single VPA that connects all the devices and systems - including different voice recognition engines. Exactly like smart home solutions need to integrate the thermostat with the air conditioning, heating, ventilation, and window blinders (at least) to effectively control room temperature, so voice recognition engines need to be effective in any part of the house, and need to be coherent.

If the user asks a question, the system will need to know where the user is, from which room the command originated and provide the answer visually on the closest screen, or audibly through the nearest speaker - also knowing that someone is asleep in another room and doesn't need to hear the response. If you want to see a movie and ask for it from the bedroom, you will see it on the bedroom TV set and you will not activate the complete home entertainment system in the living room downstairs.

The other essential part of the concept will be multiuser. We want the system to be used by the entire family, meaning multiple users (and multiple voice characteristics) must be recognized from any room inside the home, while the VPA itself needs to know how to manage different simultaneous (and possibly contradictory) commands originating from all the users at different source locations.

We are still far from this scenario but that's the only way I could envisage those technologies becoming successful. We cannot create a complete concept such as VPAs and sell the smart home concept, thinking that all the potential users will live alone at home and there will not be other voices all the time. And let's not even start discussing the dilemma with all the different languages, idioms, accents, etc., which are the largest challenges currently faced by voice recognition technologies, before we even mention machine learning or "artificial intelligence" (in which language was "intelligence" programmed?)

*According to Global Market Insights, the global market for Smart Speakers represented more than USD $400 million in 2016 and is predicted to grow at an estimated CAGR of over 50% from 2017 to 2024.*

One thing is for certain, for voice technologies to be successful, every single accomplishment and successful implementation that we are currently witnessing with the first-generation Amazon Echo, Google Home, Apple HomePod, etc., will essentially be the foundation for those audio systems that will need to be integrated in virtually every appliance - or at least connected with other appliances. In the next few years, we will see many new appliances and even children's toys that will be equipped with full audio and voice systems. Eventually, users will realize that only an integrated solution makes sense, and eventually those systems will be part of a standard home installation.

Meanwhile, many devices will need to be connected, and they will need a microphone, audio engine and speaker solution. That's a lot of MEMS microphone arrays, audio systems, and speakers. For users to truly trust and understand how to use VPA technology (if they do use it), there will be lots of challenges and opportunities for the audio industry. While all the companies in the world are trying, the audio industry will be busy.

About Joao Martins

Since 2013, Joao Martins leads audioXpress as editor-in-chief of the US-based magazine and website, the leading audio electronics, audio product development and design publication, working also as international editor for Voice Coil, the leading periodical for... Read more

« Back