Théo Trouillon - Complex-Valued Embedding Models for Knowledge Graphs

12:30

Friday

Sep

2017

Thesis defence

Place:

IMAG Building Amphitheatre

Organized by:

Théo Trouillon

Speaker:

Théo Trouillon

Jury:

M. Yves Grandvalet, Research Director at the the French National Center for Scientific Research, Reviewer
M. Volker Tresp, Professor at the Ludwig-Maximilian University of Munich, Reviewer
M. Andrew McCallum, Professor at the University of Massachusetts Amherst, Examiner
M. Mohamed Nadif, Professor at the Paris Descartes University, Examiner
M. Éric Gaussier, Professor at University Grenoble Alps, Director
M. Christopher Dance, Research Fellow at Naver Labs Europe, Co-Advisor

The explosion of widely available relational data in the form of knowledge graphs enabled many applications, including automated personal agents, recommender systems and enhanced web search results. The very large size and notorious incompleteness of these databases calls for automatic knowledge graph completion methods to make these applications viable. Knowledge graph completion, also known as link prediction, deals with automatically understanding the structure of large knowledge graphs (labeled directed graphs) to predict missing entries (the labeled edges). An increasingly popular approach consists in representing a knowledge graph as a 3rd-order tensor, and using tensor factorization methods to predict their missing entries.
State-of-the-art factorization models propose different trade-offs between modeling expressiveness, time and space complexity, and generalization abilities. We introduce a new model, ComplEx, for Complex Embeddings, to reconcile expressiveness, complexity and generalization through the use of complex-valued factorization. We corroborate our approach theoretically and show that all possible knowledge graphs can be exactly decomposed by the proposed model. Our approach based on complex embeddings is arguably simple, as it only involves a complex-valued trilinear product, whereas other methods resort to more and more complicated composition functions to increase their expressiveness. The proposed ComplEx model is scalable to large data sets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link-prediction benchmarks.