Maha Elbayad - Rethinking the Design of Sequence-to-Sequence Models for Efficient Machine Translation

Organized by: 
Maha Elbayad
Maha Elbayad

It is my great pleasure to announce that my PhD thesis defense will be held on June 22nd at 14:00 over video-conference.
The thesis is entitled "Rethinking the Design of Sequence-to-Sequence Models for Efficient Machine Translation" and was conducted within the LIG (Getalp team) and the INRIA (Thoth team) under the supervision of Laurent Besacier and Jakob Verbeek.
Unfortunately, the defense will not be open for public and is, sadly, not followed by the traditional “pot".


Jury committee:

  • Hermann Ney, Professor, RWTH Aachen University, Reviewer.
  • Holger Schwenk, Research director, Facebook AI research, Reviewer.
  • Marine Carpuat, Assistant professor, University of Maryland, Examiner.
  • Francois Yvon, Senior Researcher, LIMSI-CNRS, Examiner
  • Laurent Besacier, Professor, LIG, Thesis director.
  • Jakob Verbeek, Research director, Facebook AI research / INRIA, Thesis co-director.

In recent years, deep learning has enabled impressive achievements in Machine Translation. Neural Machine Translation (NMT) relies on training deep neural networks with large number of parameters on vast amounts of parallel data to learn how to translate from one language to another. One crucial factor to the success of NMT is the design of new powerful and efficient architectures. State-of-the-art systems are encoder-decoder models that first encode a source sequence into a set of feature vectors and then decode the target sequence conditioning on the source features. In this thesis we question the encoder-decoder paradigm and advocate for an intertwined encoding of the source and target so that the two sequences interact at increasing levels of abstraction. For this purpose, we introduce Pervasive Attention, a model based on two-dimensional convolutions that jointly encode the source and target sequences with interactions that are pervasive throughout the network. To improve the efficiency of NMT systems, we explore online machine translation where the source is read incrementally and the decoder is fed partial contexts so that the model can alternate between reading and writing. We investigate deterministic agents that guide the read/write alternation through a rigid decoding path, and introduce new dynamic agents to estimate a decoding path for each sample. We also address the resource-efficiency of encoder-decoder models and posit that going deeper in a neural network is not required for all instances. We design depth-adaptive Transformer decoders that allow for anytime prediction and sample-adaptive halting mechanisms to favor low cost predictions for low complexity instances and save deeper predictions for complex scenarios.