Emmanuel Vincent - Speech and audio processing in everyday environments

Lieu : 
Organisé par : 

Michel Vacher

Intervenant : 

Emmanuel Vincent (Inria Nancy - Grand Est)

Équipes : 

Speech recognition remains a challenging goal in everyday environments involving multiple background sources and reverberation. The popular "pipeline" approch involves two steps : 1. separating the target speech signal from the noise signal 2. applying a conventional speech recognizer to the enhanced signal.

In the first part of my talk, I will present a statistical modeling framework for audio source separation which makes it possible to jointly exploit various pieces of information about the sources and the environment. I will provide sound examples for the separation of speech vs. noise.

In the second part of the talk, I will argue that the "pipeline" approach yields suboptimal results due to the propagation of errors from the first step to the second step. I will introduce the uncertainty handling framework, which aims to replace the deterministic signal transiting through the pipeline by a full posterior distribution quantifying the confidence or the uncertainty in each part of the separated signal. I will show some achievements in that framework.