Our team provided a tutorial entitled “Foundations, Extensions and Applications of Statistical Multichannel Speech Separation Models” at Interspeech 2023.

This tutorial aims to enlighten audio and speech researchers who are interested in source separation and speech enhancement on how to formulate a physics-aware probabilistic model that explicitly stands for the generative process of observed audio signals (direct problem) and how to derive its maximum likelihood estimator (inverse problem) in a principled manner. Under mismatched conditions and/or with less training data, the separation performance of supervised methods might be degraded drastically in the real world, as is often the case with deep learning-based methods that work well in controlled benchmarks. We show first that the state-of-the-art blind source separation (BSS) methods can work comparably or even better in the real world and play avital role for drawing the full potential of deep learning-based methods. Secondly, this tutorial introduces how to develop an augmented reality (AR) application for smart glasses with real-time speech enhancement and recognition of target speakers.

In this tutorial, I presented adaptive speech enhancement systems with augmented reality (AR) smart glasses. The following is one of the demo videos.