Ngoc Quang Luong - Word Confidence Estimation for Statistical Machine Translation

13:00

Wednesday

Nov

2014

Thesis defence

Place:

Campus - ENSIMAG - Amphi H

Organized by:

Ngoc Quang Luong

Speaker:

Ngoc Quang Luong

Jury:

Prof. Catherine BERRUT, University of Grenoble, France, President

Prof. Kamel SMAÏLI, University of Lorraine, Nancy, France, Reviewer

Prof. Lucia SPECIA, University of Sheffield, United Kingdom, Reviewer

Assoc. Prof. Guilaume WISNIEWSKI, University Paris-Sud XI, Paris, Examinator.

Prof. Laurent BESACIER, University Joseph Fourier, Grenoble, Supervisor.

Assoc. Prof. Benjamin LECOUTEUX, University Pierre Mendès France, Grenoble, Co-supervisor

Machine Translation (MT) systems, which generate automatically the translation of a target language for each source sentence, have achieved impressive gains during the recent decades and are now becoming the eective language assistances for the entire community in a globalized world. Nonetheless, due to various factors, MT quality is still not perfect in general, and the end users therefore expect to know how much should they trust a specific translation. Building a method that is capable of pointing out the correct parts, detecting the translation errors and concluding the overall quality of each MT hypothesis is definitely beneficial for not only the end users, but also for the translators, post-editors, and MT systems themselves. Such method is widely known under the name Confidence Estimation (CE) or Quality Estimation (QE). The motivations of building such automatic estimation methods originate from the actual drawbacks of assessing manually the MT quality: this task is!

time consuming, eort costly, and sometimes impossible in case where the readers have little or no knowledge of the source language.

This thesis mostly focuses on the CE methods at word level (WCE). The WCE classifier tags each word in the MT output a quality label. The WCE working mechanism is straight-forward: a classifier trained beforehand by a number of features using ML methods computes the confidence score of each label for each MT output word, then tag this word with highest score label. Nowadays, WCE shows an increasing importance in many aspects of MT. Firstly, it assists the post-editors to quickly identify the translation errors, hence improve their productivity. Secondly, it informs readers of portions of sentence that are not reliable to avoid the misunderstanding about the sentence’s content. Thirdly, it selects the best translation among options from multiple MT systems. Last but not least, WCE scores can help to improve the MT quality via some scenarios: N-best list re-ranking, Search Graph Re-decoding, etc.

In this thesis, we aim at building and optimizing our baseline WCE system, then exploiting it to improve MT and Sentence Confidence Estimation (SCE). Compare to the previous approaches, our novel contributions spread of these following main points. Firstly, we integrate various types of prediction indicators: system-based features extracted from the MT system, together with lexical, syntactic and semantic features to build the baseline WCE systems. We also apply multiple Machine Learning (ML) models on the entire feature set and then compare their performances to select the optimal one to optimize. Secondly, the usefulness of all features is deeper investigated using a greedy feature selection algorithm. Thirdly, we propose a solution that exploits Boosting algorithm as a learning method in order to strengthen the contribution

of dominant feature subsets to the system, thus improve of the system’s prediction capability. Lastly, we explore the contributions of WCE in improving MT quality via some scenarios. In N-best list re-ranking, we synthesize scores from WCE outputs and integrate them with decoder scores to calculate again the objective function value, then to re-order the N-best list to choose a better candidate. In the decoder’s search graph re-decoding, the proposition is to apply WCE score directly to the nodes containing each word to update its cost regarding on the word quality. Furthermore, WCE scores are used to build useful features, which can enhance the performance of the Sentence Confidence Estimation system.

In total, our work brings the insightful and multidimensional picture of word quality predic- tion and its positive impact on various sectors for Machine Translation. The promising results open up a big avenue where WCE can play its role, such as WCE for Automatic Speech Recognition (ASR) System (when combined with ASR features), WCE for multiple MT selection, and WCE for re-trainable and self-learning MT systems.