NF-IVA
A. A. Nugraha, K. Sekiguchi, M. Fontaine, Y. Bando, and K. Yoshii, "Flow-Based Independent Vector Analysis for Blind Source Separation," IEEE Signal Process. Lett., vol. 27, pp. 2173-2177, 2020.
Abstract
This paper describes a time-varying extension of independent vector analysis (IVA) based on the normalizing flow (NF), called NF-IVA, for determined blind source separation of multichannel audio signals. As in IVA, NF-IVA estimates demixing matrices that transform mixture spectra to source spectra in the complex-valued spatial domain such that the likelihood of those matrices for the mixture spectra is maximized under some non-Gaussian source model. While IVA performs a time-invariant bijective linear transformation, NF-IVA performs a series of time-varying bijective linear transformations (flow blocks) adaptively predicted by neural networks. To regularize such transformations, we introduce a soft volume-preserving (VP) constraint. Given mixture spectra, the parameters of NF-IVA are optimized by gradient descent with backpropagation in an unsupervised manner. Experimental results show that NF-IVA successfully performs speech separation in reverberant environments with different numbers of speakers and microphones and that NF-IVA with the VP constraint outperforms NF-IVA without it, standard IVA with iterative projection, and improved IVA with gradient descent.
Reference
A. A. Nugraha, K. Sekiguchi, M. Fontaine, Y. Bando, and K. Yoshii, “Flow-Based Independent Vector Analysis for Blind Source Separation,” IEEE Signal Processing Letters, vol. 27, pp. 2173-2177, 2020, doi: 10.1109/LSP.2020.3039944.
Audio Samples
- All methods considered below are for determined blind source separation (BSS), in which source images are estimated given observations (captured by microphones) irrespective of the actual number of sources. Therefore, as post-processing for the overdetermined cases, we pick estimated source images that have the highest average power.
- For the listening purpose, all audio files are stereo obtained by taking the first two channels from the estimated multichannel source images.
- For simplicity, the number of parameter updates for AuxIVA is 64 and that for the others is 2048, although they might be sub-optimal.
2-speaker separation
- Sample set 1:
wsj0-2mix -- 2speakers_reverb -- tt -- 22go010c_2.1682_051o0212_-2.1682
- Sample set 2:
wsj0-2mix -- 2speakers_reverb -- tt -- 446o030r_1.7325_420c020z_-1.7325
wsj0-2mix -- 2speakers_reverb -- tt -- 22go010c_2.1682_051o0212_-2.1682
Mixture | Sources |
---|---|
Methods | 2 microphones | 4 microphones | 6 microphones | 8 microphones |
---|---|---|---|---|
AuxIVA - 64 param. updates |
||||
IVA-BP - 2048 param. updates |
||||
NF-IVA (NVP) - 1 flow block - 2048 param. updates |
||||
NF-IVA (NVP) - 2 flow blocks - 2048 param. updates |
||||
NF-IVA (NVP) - 4 flow blocks - 2048 param. updates |
||||
NF-IVA (VP) - 1 flow block - 2048 param. updates |
||||
NF-IVA (VP) - 2 flow blocks - 2048 param. updates |
||||
NF-IVA (VP) - 4 flow blocks - 2048 param. updates |
(back to the top of this section)
wsj0-2mix -- 2speakers_reverb -- tt -- 446o030r_1.7325_420c020z_-1.7325
Mixture | Sources |
---|---|
Methods | 2 microphones | 4 microphones | 6 microphones | 8 microphones |
---|---|---|---|---|
AuxIVA - 64 param. updates |
||||
IVA-BP - 2048 param. updates |
||||
NF-IVA (NVP) - 1 flow block - 2048 param. updates |
||||
NF-IVA (NVP) - 2 flow blocks - 2048 param. updates |
||||
NF-IVA (NVP) - 4 flow blocks - 2048 param. updates |
||||
NF-IVA (VP) - 1 flow block - 2048 param. updates |
||||
NF-IVA (VP) - 2 flow blocks - 2048 param. updates |
||||
NF-IVA (VP) - 4 flow blocks - 2048 param. updates |
(back to the top of this section)
3-speaker separation
- Sample set 1:
wsj0-3mix -- 3speakers_reverb -- tt -- 22go010n_0.046171_444c020e_-0.046171_442o030y_0
- Sample set 2:
wsj0-3mix -- 3speakers_reverb -- tt -- 446o030o_2.0593_445c020s_-2.0593_423c020w_0
wsj0-3mix -- 3speakers_reverb -- tt -- 22go010n_0.046171_444c020e_-0.046171_442o030y_0
Mixture | Sources |
---|---|
Methods | 3 microphones | 4 microphones | 6 microphones | 8 microphones |
---|---|---|---|---|
AuxIVA - 64 param. updates |
||||
IVA-BP - 2048 param. updates |
||||
NF-IVA (NVP) - 1 flow block - 2048 param. updates |
||||
NF-IVA (NVP) - 2 flow blocks - 2048 param. updates |
||||
NF-IVA (NVP) - 4 flow blocks - 2048 param. updates |
||||
NF-IVA (VP) - 1 flow block - 2048 param. updates |
||||
NF-IVA (VP) - 2 flow blocks - 2048 param. updates |
||||
NF-IVA (VP) - 4 flow blocks - 2048 param. updates |
(back to the top of this section)
wsj0-3mix -- 3speakers_reverb -- tt -- 22go010n_0.046171_444c020e_-0.046171_442o030y_0
Mixture | Sources |
---|---|
Methods | 3 microphones | 4 microphones | 6 microphones | 8 microphones |
---|---|---|---|---|
AuxIVA - 64 param. updates |
||||
IVA-BP - 2048 param. updates |
||||
NF-IVA (NVP) - 1 flow block - 2048 param. updates |
||||
NF-IVA (NVP) - 2 flow blocks - 2048 param. updates |
||||
NF-IVA (NVP) - 4 flow blocks - 2048 param. updates |
||||
NF-IVA (VP) - 1 flow block - 2048 param. updates |
||||
NF-IVA (VP) - 2 flow blocks - 2048 param. updates |
||||
NF-IVA (VP) - 4 flow blocks - 2048 param. updates |