Ngoc Tien Le - Advanced Quality Measures For Speech Translation

13:00

Monday

Jan

2018

Thesis defence

Place:

IMAG Building Amphitheatre

Organized by:

Ngoc Tien Le

Speaker:

Ngoc Tien Le

Teams:

GETALP

Keywords:

Quality estimation
Word confidence estimation (WCE)
Spoken Language Translation (SLT)
Joint Features
Feature Selection

Jury composition:

M. Yannick Estève
Professeur, Laboratoire d’Informatique de l’Université du Maine (LIUM), Le Mans Université, Reviewer
M. Georges Linarès
Professeur, Laboratoire Informatique d’Avignon (LIA), Université d’Avignon, Reviewer
M. Frédéric Béchet
Professeur, Laboratoire d’Informatique Fondamentale de Marseille (LIF), Aix Mar- seille Université, Examiner
M. Laurent Besacier
Professeur, Laboratoire d’Informatique de Grenoble (LIG), Université Grenoble Alpes, Supervisor
M. Benjamin Lecouteux
Maître de conférences, Laboratoire d’Informatique de Grenoble (LIG), Université Grenoble Alpes, Co-supervisor

The main aim of this thesis is to investigate the automatic quality assessment of spoken language translation (SLT), called Confidence Estimation (CE) for SLT. Due to several factors, SLT output having unsatisfactory quality might cause various issues for the target users. Therefore, it is useful to know how we are confident in the tokens of the hypothesis. Our first contribution of this thesis is a toolkit LIG-WCE which is a customizable, flexible framework and portable platform for Word-level Confidence Estimation (WCE) of SLT. WCE for SLT is a relatively new task defined and formalized as a sequence tagging problem in which each word of SLT output is marked as one of binary labels (good or bad) in agreement with a large feature set. We propose several word confidence estimators (WCE) based on our automatic evaluation of transcription (ASR) quality, translation (MT) quality, or both (combined/joint ASR+MT). We built a corpus that contains 6.7k utterances in which each quintuplet consists of ASR hypothesis, verbatim transcript, text translation, speech translation and post-edition of translation. We performed several experiments for WCE using joint ASR and MT features to show that MT features remain the most influent while ASR features can bring interesting complementary information.

As another contribution, we propose two methods to disentangle ASR errors and MT errors, where each word in the SLT hypothesis is tagged as good, asr_error or mt_error. We thus explore the contributions of WCE for SLT in finding out the source of SLT errors.

Furthermore, we propose a simple extension of WER metric in order to penalize differently substitution errors according to their context using word embeddings. For instance, the proposed metric should catch near matches (mainly morphological variants) and penalize less this kind of error which has a more limited impact on translation performance. Our experiments show that the correlation of the new proposed metric with SLT performance is better than the one of WER. Oracle experiments are also conducted and show the ability of our metric to find better hypotheses (to be translated) in the ASR N-best. Finally, we present and analyze a preliminary experiment in which ASR tuning is applied by our new metric.

To conclude, we have proposed several prominent strategies for CE of SLT that could have a positive impact on several applications for SLT. Robust quality estimators for SLT output can be applied to provide feedback to the user in computer-assisted speech-to-text scenarios or to re-score ST graphs.