Georgios Balikas - Mining and Learning from Multilingual Text Collections using Topic Models and Word Embeddings

Organized by: 
Georgios Balikas
Georgios Balikas


Membres du jury :

  • Cyril Goutte, chercheur senior au Conseil National de Recherches Canada , rapporteur
  • Gaël Dias, professeur à l'Université de Caen, rapporteur
  • Laurent Besacier, professeur à l'Université Grenoble Alpes, examinateur
  • Patrick Gallinari, professeur à l'Université Pierre et Marie Curie, examinateur
  • Guillaume Vernat, chercheur, Coffreo, examinateur
  • Massih-Reza Amini, professeur à l'Université Grenoble Alpes, directeur de thèse


In this thesis we focus on learning text representations based on the distributional hypothsis stating that linguistic items with similar distributions should have similar meanings. In the first part of the thesis, we consider probabilistic topic models for monolingual and bilingual text corpora. We identify some of the limitations of such models, for instance the fact that they do not account for text structure, and we propose ways to alleviate them. The second part of the thesis focuses on word embeddings, that is continuous word representations learned with neural networks. We investigate different settings of text classification and document retrieval problems. We propose algorithms that benefit from the expressiveness of word embeddings, either using deep neural networks or a re-formulation of the optimal transport problem.