“Solving the so-called ‘cocktail party problem’ has been the holy grail of speech processing for more than 50 years: we finally cracked it,” says Jonathan Le Roux, Ph.D., Principal Research Scientist at MERL and one of the leading researchers on the project.
In tests, the simultaneous speeches of two and three people were separated with up to 90 and 80 percent accuracy, respectively. The technology, which was achieved using Mitsubishi Electric’s proprietary “deep clustering” method based on artificial intelligence (AI), is expected to contribute to more intelligible voice communications and more accurate automatic speech recognition. A characteristic feature of this approach is its versatility, in the sense that voices can be separated regardless of their language or the gender of the speakers.
Mitsubishi Electric will explore opportunities to apply this new speech separation technology to improve the quality of voice communications and the accuracy of automatic speech recognition in real environments, such as cars, homes and elevators.
“Until now, there has been no effective method to accurately reconstruct the speech of multiple unknown speakers recorded with just one microphone,” says Richard Waters, Ph.D., president, CEO and founding member of Mitsubishi Electric Research Laboratories.
Cambridge, Massachusetts-based Mitsubishi Electric Research Labs, the North American research arm of Mitsubishi Electric Corporation, is home to some of the world's leading experts in such areas as electronics and communications, multimedia, data analytics, computer vision, mechatronics and algorithms, holds more than 1,200 patents.