Audio Software Quo Vadis? Artificial Intelligence and Machine Learning Used in Audio Tools

December 5 2018, 17:10
There was a time when recording and editing/mixing audio and video on a computer meant choosing your workstation, your favorite hardware audio platform (usually in PC board format) and buying your software and plug-ins, which meant you would end up with a proud shelf of “sexy” software boxes with 11 lbs of manuals (easily!). Your main dilemmas – after how much money you could spend, obviously – were then related with the choice of Mac or Windows software, and that's an area I don't even want to touch upon given the seriousness of the debate.
Adobe Audition might not be the most popular audio software out there, but it provides a comprehensive audio post solution, now with next-generation AI-based audio cleanup technologies and a modern - and much faster - multitrack environment.

I just have to mention the Mac, because Apple was the disruptor in the creative content production industry. It began when it introduced its family of software production tools with the acquisition of the Macromedia development team responsible for Final Cut and the acquisition of Emagic, the creators of Logic Pro (among lots of other great audio and music software). The move quickly proved to be highly successful. Logic Pro was always one of the top digital audio workstation (DAW) contenders in any studio, and around 2008, Final Cut Pro represented half of professional editing market globally. Then Apple started to focus on iOS, iPhone, and iPads and updated both its video editing tools iMovie and Final Cut Pro with a disastrous interface change (yes, never get rid of the timeline!) that basically alienated almost all its users in just a few years (myself included). Logic Pro was not so affected, but gradually was converted into a sort of "pro" iteration of Garageband for hip-hop production and library-based music production (and frankly most users forgot about it, simply because Apple also did...).
With Apple abandoning the space without ever explaining why, Avid and Adobe were back as the top contenders in the content production market. The problem was that for many creative users, changing interfaces and software is the last thing you want to do, and moving platforms was not exactly an easy process. Avid and Adobe always tried to keep their software available for both Mac and Windows (with a few exceptions as a result of acquisitions and legacy software) and the considerable existing Windows user-base helped both companies to gradually take over what was left of Apple's market inroad.
Meanwhile, Apple had created the online App Store, which was never considered something for the professional market because of the higher prices, the fact that companies needed valid purchase receipts from recognized local entities for tax purposes, or simply because professionals like to have software on a physical format to install in machines that were never connected to the Internet for safety. Still, gradually, all software from Apple was made exclusively available on the App Store and prices were greatly reduced as well (as much as 80%) to make it compatible with online purchases.
Avid, which is a company with a deeply troubled corporate history, started to transition all its software, including the market leader Pro Tools, to a cloud-based and online cooperative production model, even if it never totally killed the perpetual software licenses. Today, Avid offers all possible options, from monthly subscriptions to annual downloads and a $599 Perpetual License, paid upfront with a Physical Box (buy it once and own it forever model) for Pro Tools.
Immediately after it became clear that Apple was exiting the professional creative content market, Adobe moved even faster. The company transitioned into a completely different model, from documents (PDF), multimedia, and creativity software company, into the current e-commerce, cloud services, business analytics, advertising, and marketing behemoth that it is today. That meant moving its software business into a subscription model, which the company effectively did more than 10 years ago. And Adobe did this without ever reducing prices. In fact, it made its Creative Suite family of products even more expensive on a monthly subscription basis, in order to encourage professionals and companies to sign-up for annual automatically renewed subscriptions (or memberships, as they call it now). 
After successfully completing that transition (for many creative professionals, such as magazine publishing, there wasn't really an alternative), Adobe started to add online services to those licenses, making its business model stickier and much more resilient. By adding things such as cloud storage, online collaboration, and creating a platform for freelancers to sell services online or offering free online media resources available as part of the subscription (e.g., stock photography and fonts), it becomes very hard to ever get away from the ecosystem, and many clients update from just a few core software tools to the entire software package. With this policy, Adobe is certainly the company better positioned to lead in professional creative software. With one exception... and that's in audio!
When we look at Adobe's portfolio of solutions, which includes the famous Acrobat (for PDF), Photoshop, Media Composer (video editing), and After Effects (visual effects and animations), in a family of 21 core software programs, there is one for audio: Adobe Audition. Adobe Audition, used for audio recording, mixing, and restoration software, was acquired from Syntrillium Software in 2003 and was previously known as Cool Edit Pro.
Durin Gleaves is Product Manager for Audio at Adobe.

This a rather long prologue to justify why I decided to interview Adobe's Audio Product Manager, Durin Gleaves, just before the company made the big announcement about its latest updates to the Creative Suite, previewed at the IBC 2018 show in Amsterdam. I really wanted to understand how Adobe was positioning its Audition software, other than being a secondary tool for Premiere Pro, and what exactly were the updates in the latest release for what the company now calls its Creative Cloud Video and Audio Tools.
An overview of what Audition can or cannot do would take much more space than I have available here. I highly recommend our Audio Editing Software Roundup published in four parts by our colleague and music software expert, Fernando Rodrigues, published from December 2017 to March 2018, in which he reviews Audition in comparison to other software options. As Fernando Rodrigues explains, in feature-to-feature, Audition doesn't even compete with the heavyweights such as Pro Tools, instead competing with relatively unknown tools like 2nd Audio Rsample, Acon Digital Acoustica, Ivosight Soundop, Magix Sound Forge, and SoundIt! 8 Pro (most of which are also much more affordable.)
For the latest 2018 updated release of Creative Cloud, Adobe focused on things such as making its new generation of artificial intelligence (AI) -powered features, which the company calls Adobe Sensei, available for all its software. These are features that essentially help users accelerate mechanical tasks, such as auto lip-syncing an animation with performance-captured mouth movements and spoken sounds - a very smart and cool feature. Specifically, for Audition, using machine learning, the new Auto-Ducking feature automatically lowers soundtrack volume during spoken dialog. Also, there's a new Intelligent Audio Cleanup (also common to Premiere Pro), together with Reduce Noise and Reduce Reverb sliders in the Essential Sound panel make removing reverb and background noise easier than ever. Noise removal and clean-up was always a strong area for Audition, but the process was very time consuming and now there are many algorithmic plug-ins available that do that better and faster. But inside Audition and Premiere, these enhancements certainly are welcome. Audition is now more tightly integrated with Premiere Pro and other tools than ever, and that's great for video editors.
I should also point out that Adobe often incorporates its own software with tools from other companies, when it is something they don't do and there is a clear market reference and specialized software they can support. That's what they've recently done with 3D animation, by supporting Maxon's Cinema 4D software. But that becomes hard to fit within its membership model, as I'll expand.
Improvements and tuning to Audition’s playback and recording engine should now support up to 128 tracks of playback, or simultaneously record 32 tracks, at low latencies with standard modern workstations - and performance scales from there by adding faster SSD drives and audio interfaces.

Looking at the latest Adobe Audition CC, it clearly isn't a fully featured DAW, certainly not a tool for music production (it doesn't support MIDI or virtual instruments), and as exciting as auto-ducking can be for editors, it's not something that's going to conquer new audio users for Adobe. It certainly is a more powerful version under the interface, allowing for 400% faster mixdowns and bounces, and more precise surround panning, among other enhancements. 
Probably a good example of how Adobe looks at audio updates is the fact that this version introduces support for the good-old Mackie HUI protocol, finally allowing Audition users to use current control surfaces and consoles, including support for HUI-enabled timecode display and control devices (20 years later...). Also, funny enough, the 2018 Character Animator CC update introduces MIDI support for action triggers.
These are obviously welcome enhancements, considering that you get them automatically with your subscription, but it would not encourage anyone to pay for an upgrade if this was 1998. As Adobe explains in the announcement, "improvements we put in each release... often come from feature requests from customers and users and are specific solutions to real problems." So there.
When I asked Durin Gleaves to give me an idea of Adobe's focus for these Audition updates, he was quick to mention "simplification" and "collaboration," because as part of the Creative Cloud, users are no longer working on just one application.
"Audition has a lot of adoption from people that are working on video editing, are visual designers, or are creating a podcast to promote a brand. There are a lot of different users and different skill levels, people for whom audio is not their expertise, but they need to create something with sound."

On the other hand, he also explained that they want to create tools that are more powerful but simpler to use. I think that characterizes very well what currently separates the smaller and specialized software houses, which make incredibly powerful and deep software (with an extensive learning curve in most cases) and large software companies (e.g., Adobe), who design tools that are used for media production in general. Integrating machine learning and AI to make the tasks easier and faster for those users is now a completely new focus, and something that I think will determine the way creative tools will be defined from now on.
As Gleaves explains, it simply is no longer possible to teach every user to explore every single manual task (I agree and can attest from my own extensive usage of Adobe's software). So, the software offers "automatic" features side-by-side with the traditional options to manually adjust things in detail and experiment as much as needed, without being intimidating. Very important, allowing the software foundation to scale, and allow the users to grow in their ability to explore and make the best use of the tools that are available. The fact that the software (and the platform it runs in) is now much more powerful and faster, also helps in making these features "automatic" and highly effective, of course.
And as Durin explains, it does the processing faster and at the local level (CPU-based) without taking processes to the cloud. So "users can be on a desert island and apply noise-removal or reverb reduction to a hundred clips with low latency (milliseconds) and low overhead, for a project running on a laptop." 
The new DeReverb tool in Adobe Audition can dramatically improve audio by reducing unwanted echo from a clip using adaptive algorithms that apply real-time adjustments based on the specific characteristics of sound clips. Like the DeReverb, the new DeNoise algorithm are trained using machine learning and get better over time.
Deconvolution (as the industry calls it) or reverb reduction as Adobe calls it, is also something that is extremely useful for the current usage profile. People record sound, voice-overs or dialog in a home office or at the office in a quite conference room, but there are always interferences from lights, air-conditioning noise, and multi-reflective surfaces, all generating bad sound that needs heavy processing to sound like it was captured in a proper, acoustically treated environment. And the reason why Adobe's new reverb reduction process is effective on those sources is because of "machine learning, previously trained on thousands and thousands of hours of clean and reverb-heavy recordings to learn the difference. That was a really exciting breakthrough," as Durin enthusiastically confirms.
And as we discussed, next on the list for Adobe could be using the same approach for many other things, such as voice removal on a soundtrack or a song. Things that the software currently allows but can be much more improved with machine learning. I used this song reference to ask if Adobe would ever envisage turning Audition into a fully featured music production DAW, to which Durin replied, "Our focus now is recorded performance. If you have a live band, and everyone is playing instruments and you are recording each of those separately on different tracks, we're fantastic for mixing and mastering those productions. We're not a MIDI composition environment right now. That said - and this is a comment from me, not reflecting Adobe's position - I'm not really interested in making another piano roll MIDI editor. There's a whole bunch of those - some are absolutely fantastic, they've been around for 20 years, with wonderful features and workflows built-in. And honestly, it's not that big of a market for Adobe. What does interest me is the future of music production. A lot of what we do is, ‘How can we make creation of something technical, simpler, and more approachable for somebody who doesn't have the skills, the training, and the background?’ Those people should be able to realize their musical vision, and we are seeing some advances (again) in artificial intelligence, that can really take your idea, either a musical idea, or taking just the genre, or a certain style or mood, and really help to create music algorithmically that is musical, coherent and fits the timing and your production. We don't have anything at this time that we're showing, but it is certainly something that we are looking into. We are doing research and we are paying attention to the industry around us."
Interestingly, the Adobe model for the Creative Cloud subscription, makes it possible for users to benefit from new features, and especially new AI-based tools such as speech-to-text transcription, without the need to pay extra or get extra products. While many software companies currently (and I saw that very often in broadcast technology manufacturers this year) are proposing AI "as a service" (basically pay-per-use), Adobe can again leverage those AI features to increase the perceived value of its subscription model. Of course, that's valid for products that are developed in house or technology that Adobe acquired. If, such as with transcription engines, Adobe is licensing the technology, it becomes harder to integrated it in the workflow... for free. Adobe is working on getting text from video automatically transcribed and indexed on the cloud to increase the production possibilities, especially for large media companies and broadcasters. If the integration is there, all the users/subscribers can benefit. As Durin puts it, "I think we can do better."
Of course, I also had to ask about Adobe's plans for immersive audio and object-based audio production, only to find again that Adobe is not exactly ahead of the curve and that as with other fronts, they will react based on demand from their users. As Durin confessed, "We have been doing progress in immersive video, we started to dabble in immersive audio, with binaural and Ambisonics. Object-based audio really makes that a larger experience, so it's something that we're looking into, but we don't have any specific announcements yet". So, I had to ask about the recently published Netflix Production guidelines, which mention specifically MPEG-H. Durin was quick to acknowledge that reference. "It's really exciting! Organizations like Netflix are creating the next studio system, and it's really starting to inform even legacy broadcasters... And like it happened with broadcasters transitioning to the cloud really quick, we'll also have to adapt," he stated.

And since he mentioned the cloud, we discussed how Adobe sees the use of audio production and post production tools on the cloud. Adobe took Audition to broadcast years ago, reacting to market demand, when that market was transitioning to cloud-based operations. It seems that (as Avid also concluded at the time) the industry was looking to finally embrace concurrent, online-based collaboration workflows, not just store files remotely. Where audio, graphics, and all the other departments don't have to wait for the final version of the video edit to start working. Today, everyone is working concurrently on the same project and making changes until the very last minute, just before going to air or getting distributed. Getting away from the old thinking of "reel," EDL-based production, and "interchange" formats, directly into real-time online collaboration, which can only be done on the cloud. "We really want life without an interchange. To have collaborative, transparent workflows. Where there is A project. With Premiere as the application that's editing the video, and Audition as the application that's mixing and editing the audio, either one can see the changes and collaborate in real time, synchronized on the same project. Together, without those interchange middle steps. Right now, that's been focused on the file-based project, but once that foundation is in place, then we can leverage things like team projects where it's all cloud-based and everyone can collaborate - with infinite history and undo. That is going to be a revolution in post-production."
Next up, as we further discussed, comes the introduction of those AI features to power those collaborative workflows in the cloud... AI topic is certainly a fascinating one to follow because of its many implications, including the market intelligence that's generated directly by media. As Adobe recently stated in its Black Friday report "Adobe leverages Adobe Sensei, Adobe’s artificial intelligence and machine learning technology, to identify retail insights from trillions of data points that flow through Adobe Analytics and Magento Commerce Cloud, part of Adobe Experience Cloud." Think about that.
Adobe Premiere Rush CC all-in-one cross-device app.

Another fascinating topic is multiplatform and online integration. The next chance I have, I will expand on something very exciting that's coming from Adobe, which was initially called Project Rush, the first all-in-one, cross-device video editing and online sharing app, that runs on any desktop or mobile platform. Adobe already launched Premiere Rush CC and recently has shown the future version of Photoshop for Apple’s latest iPad (which is now more powerful than most Intel-based workstations), and there are many more examples of that new generation of creation software and integrated services coming from Adobe.
related items