Expanding Horizons in Audio and Voice

September 28 2017, 16:00

It's been an interesting week! Just prior to the IBC 2017 show, Apple made its traditional event for the launch of the new iPhone 8 and iPhone X (and those Augmented Reality demos were amazing!), also debuting the new Apple TV 4K. And before Samsung and all the other Smartphone manufacturers start announcing their own versions of "X" models, we had a series of interesting corporate and technology announcements, culminating by Amazon's announcement of a series of new Alexa-based hardware devices...

It's been an interesting week! Just prior to the IBC 2017 show, which I addressed partially in my previous editorial, Apple made its traditional event for the launch of the new iPhone 8 and iPhone X (and those Augmented Reality demos were amazing!), also debuting the new Apple TV 4K. And before Samsung and all the other Smartphone manufacturers start announcing their own versions of "X" models, we had a series of interesting corporate and technology announcements, culminating by Amazon's announcement of a series of new Alexa-based hardware devices, including new Echo speakers that improve audio reproduction.

I'll address some of these topics, since they were directly connected with my focus at the recent IBC show and current market trends, which I believe are very relevant to our readers. As I said, in Amsterdam I looked at technology trends in terms of the use of voice, not only for voice assistants, but speech/audio recognition, including speech to text, which is essential for closed captioning (subtitles), and is used increasingly for media asset management and content indexing. In these days of cloud infrastructure and completely filed-based production, those are becoming vital tools, and no wonder, IBC 2017 had many interesting things to see in that front.

I will highlight a company that I know has been doing presentations at all major technology shows, including CES and the Mobile World Congress (MWC). Knowing that they would be at IBC, I paid a visit to their booth and I was simply blown away with their (offline) demo of speech-to-text transcription on an iPad - right in the middle of a noisy show-floor. The company is called Speechmatics, and their AI-powered voice transcription service and new real-time virtual appliance, which enables real-time transcription to be adopted to existing systems in a variety of industries is a breakthrough!

Speechmatics is advancing speech-to-text significantly and its engine will soon be powering many new services. Its AI-powered real-time virtual appliance for real-time voice transcription is a breakthrough!

Founded in 2006, Speechmatics is a Cambridge, UK-based company, specializing in Automatic Speech Recognition (ASR). Its speech technology and Speechmatics software was pioneered since the 1980s by its founder, Dr Tony Robinson, applying neural networks. Thanks to the evolution in hardware, graphics processing (GPUs) and cloud computing, the research team was able to deliver on the technology's promise and actually explore commercial applications of speech recognition.

Now, the company is exploring all options for applications from subtitling to call analytics, meeting transcription to consumer electronics. One of the interesting aspects I have learned about Speechmatics real-time engine is that, for the first time, it is able to apply contextual grammatical tools to revise the resulting text, achieving almost perfect transcriptions in the English language. Their real-time virtual appliance features adjustable latency and dynamic transcript parameters, adjusting words using contextual analysis as words become clearer to achieve the best accuracy.

According to the Speechmatics team that I talked too, the engine is also able to quickly learn and perfect rules for any other language and they were already demonstrating examples in Dutch or Japanese. In fact, they claim their AI engine is significantly more accurate than their competitors, including Google, IBM and Microsoft, and has considerably better capability across a wide range of languages. Also, in the demonstrations I've seen, the company was showing the Speechmatics virtual appliance embedded directly into third-party software, without any reliance on the cloud. We are going to hear more about this company. In 2016, they have received investments from IQ Capital, and Amadeus Capital Partners, and their platform will be commercially available later in 2017, with licensing also available on a per stream or per hour basis.

*At the IBC 2017 Awards, Dolby was recognized with the International Honor for Excellence. (Photo, Erik Verheggen)*

Of course, there were also many other interesting cutting-edge technology and content demonstrations at IBC 2017 that deserve a mention, being clear that if those technologies are not visible to consumers just yet, they are currently being successfully implemented in existing production and distribution workflows. Artificial Intelligence (AI) is certainly one of those key-technologies, as we've seen from the example of Speechmatics. But voice recognition in general is what is also powering new user interfaces - even that I've not yet been able to see any convincing demonstrations in television displays and set-top boxes, even though I've seen many references to it.

In a very pleasant chat with Vineet Ganju, Conexant's Voice President Voice and Audio, who was promoting demonstrations with Synaptics of integrated voice for STBs, I learned that there are significant efforts in place, but that companies in the media and broadcast space are very reluctant to adopt existing Voice Personal Assistants from Amazon, Google or Microsoft and many are developing their own engines. Significantly, the demo room demonstrations in the Synaptics room were being powered with Alexa. Current designs in this space are exploring all options, including voice-remotes and even soundbars, or connected speakers that can be placed closer to the user.

And as with elsewhere in the audio industry, all the big media companies are exploring new options for personalization of content, including the use of second screens (they are still around) with AR, chatbots, Immersive Media, and Holography. This is an area where objected-based audio will find ample space for expansion, and the main reason why we should expect interesting new possibilities in the near future. As demonstrated by the experimental MPEG-H transmissions, immersive audio and loudness control is just the beginning.

Significantly this year, the IBC 2017 Awards, recognized Dolby with the International Honor for Excellence, reflecting on more than 50 years of continual improvement in sound. Craig Todd, CTO, Dolby Laboratories, accepted the award on the company's behalf from Naomi Climer, IBC Council Chair and Past IET President.

*Comcast Xfinity X1 is one of the best known examples of voice remotes and demanded a huge effort to implement.*

And talking about awards, in the past few days, the National Academy Of Television Arts & Sciences (NATAS) announced the recipients of the 69th annual Technology & Engineering Emmy Awards. The actual ceremony will take place only next April 2018, during the National Association of Broadcasters (NAB) annual NAB Show (Sunday, April 8, 2018) in Las Vegas, NV. Among the 2017 Technical/Engineering Achievement Awards, it is very significant that four companies have been chosen for their pioneering efforts in voice applications and interfaces: Comcast, Universal Electronics (UEI), Apple, and Nuance Communications.

Comcast was recognized with a Technology & Engineering Emmy on "Contextual Voice Navigation for Discovering and Interacting with TV Content," recognizing the technology teams that develop the Xfinity X1 Voice Remote and the innovative software platform that powers it. Comcast's voice remote combines Artificial Intelligence, Machine Learning, and Natural Language Processing with a powerful, cloud-based platform. In 2016 alone, customers gave more than 3.4 billion voice commands. Today, there are nearly 17 million X1 Voice Remotes in customers' homes.

Universal Electronics, Inc. (UEI), has been awarded for its work relating to voice navigation technologies for discovering and interacting with TV content. UEI was selected for its "excellence in engineering and creativity that has materially affected the television viewing experience." UEI has "diligently expanded optimized voice remote offerings with state-of-the-art microprocessors, tightly coupled with software and acoustic design to enhance the accuracy of daily voice interactions." UEI currently has more than 40 million voice-enabled remote controls in use across North America, Europe, and Asia.

Nuance Communications' Dragon TV won a Technology & Engineering Emmy Award for its "intuitive voice-activated navigation platform that empowers users to seamlessly search and discover content using just their voice." Leveraging advances in deep learning and natural language understanding, Dragon TV is designed to give service operators and manufacturers the ability to easily integrate and deliver a conversational user experience for smart and connected TVs. Nuance's embedded and cloud-based voice technologies are currently integrated in many remotes, TVs, set-top boxes, or speakers, in several languages.

*Apple was awarded with a Technology & Engineering Emmy Award for its Apple TV Siri Remote and in particular its ability to rewind and access content with simple voice commands.*

Finally, the most surprising announcement of all was the Technology & Engineering Emmy Award confirmed to Apple for its Apple TV Siri Remote. A relatively newcomer in this space - two years in the market with this technology - Apple introduced its 2015 version of the Apple TV with a remote that features Siri, enabling users to search for content with voice. The Emmy recognizes in particular the ability of the Apple TV to find and play exactly what the user requested with voice commands, including advanced replay capabilities of recorded or live content. Using just voice, Siri on Apple TV allows users to rewind content at any moment by simply asking, "What did he say?" This is a significant implementation, considering the need to integrate hardware and software, allowing to solve a very simple need in a natural way. In the best Apple tradition.

Significantly, Apple just updated the Apple TV to support, 4K image, HDR and Dolby Vision, Dolby Atmos, and now users can even request: "Show me movies in 4K." And the new Apple TV 4K comes with support for AirPlay 2, allowing to control multiple AirPlay 2-compatible speakers as well as home theater speakers.

Article originally published on The Audio Voice newsletter.
Register here to receive The Audio Voice weekly: http://bit.ly/1ri0b4J

About Joao Martins

Since 2013, Joao Martins leads audioXpress as editor-in-chief of the US-based magazine and website, the leading audio electronics, audio product development and design publication, working also as international editor for Voice Coil, the leading periodical for... Read more

« Back