Published September 2, 2019
| Version v1
Conference paper
Cauchy Multichannel Speech Enhancement with a Deep Speech Prior
Contributors
Others:
- Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH) ; Centre Inria de l'Université de Lorraine ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD) ; Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Télécom ParisTech
- RIKEN Center for Advanced Intelligence Project [Tokyo] (RIKEN AIP) ; RIKEN - Institute of Physical and Chemical Research [Japon] (RIKEN)
- Signal, Statistique et Apprentissage (S2A) ; Laboratoire Traitement et Communication de l'Information (LTCI) ; Institut Mines-Télécom [Paris] (IMT)-Télécom Paris ; Institut Mines-Télécom [Paris] (IMT)-Institut Polytechnique de Paris (IP Paris)-Institut Polytechnique de Paris (IP Paris)-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris ; Institut Mines-Télécom [Paris] (IMT)-Institut Polytechnique de Paris (IP Paris)-Institut Polytechnique de Paris (IP Paris)
- Département Images, Données, Signal (IDS) ; Télécom ParisTech
- University of Tokyo [Kashiwa Campus]
- Scientific Data Management (ZENITH) ; Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM) ; Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Centre Inria d'Université Côte d'Azur (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- This work was partly supported by the research programme KAMoulox (ANR- 15-CE38-0003-01) funded by ANR, the French State agency for research, and JSPS KAKENHI No. 19H04137.
- ANR-15-CE38-0003,KAMoulox,Démixage en ligne de larges archives sonores(2015)
Description
We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multi-variate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. To deal with the non-additivity of scatter matrices, our first contribution is to perform the enhancement on a projected space. Then, our second contribution is to combine a latent variable model for speech, which is trained by following the variational autoencoder framework, with a low-rank model for the noise source. At test time, an iterative inference algorithm is applied, which produces estimated parameters to use for separation. The speech latent variables are estimated first from the noisy speech and then updated by a gradient descent method, while a majorization-equalization strategy is used to update both the noise and the spatial parameters of both sources. Our experimental results show that the Cauchy model outperforms the state-of-art methods. The standard deviation scores also reveal that the proposed method is more robust against non-stationary noise.
Abstract
International audienceAdditional details
Identifiers
- URL
- https://telecom-paris.hal.science/hal-02288063
- URN
- urn:oai:HAL:hal-02288063v1
Origin repository
- Origin repository
- UNICA