Bart Thijs - The application of NLP in quantitative science studies

Organisé par : 
Cyril Labbe
Intervenant : 
Bart Thijs - KU Leuven – ECOOM
Équipes : 
The application of NLP in quantitative science studies


Bart Thijs is a Senior Researcher in Bibliometrics at KU Leuven – ECOOM. His research focusses on the integration of social network analysis and Natural Language Processing into the common framework available in Quantitative Science Studies. He tries to identify  global and local network structures and dynamics within scientific communication and study their implications for the development and application of citation-based indicators. He combines a background from Mathematical Psychology (Master’s degree from KU Leuven) and Social Sciences (PhD from Leiden University) with professional experience as a statistical consultant in his work. He teaches a course on Social Network Analysis in the Big Data option of the Master Of Artificial Intelligence program at KU Leuven and supervised several internship on Computational Linguistics within the same master program.



“What are the recent breakthroughs in my field?”, “Who is active in emerging topic in our national science system?”, “Which research topics have become obsolete?” These are typical questions raised by researchers, university administrators or policy makers. Quantitative science studies tries to provide scientifically based answers to these questions. An important data source is provided by the overwhelming set of scientific publications. In order to extract the relevant information from these publications, Natural Language Processing has entered the field of Quantitative Science Studies. And, those techniques provide irrefutable advantages over traditional pure citation-based approaches but not without additional and substantial costs 
In this seminar, I’ll discuss the application of common techniques from NLP, their advantages and challenges.  POS-tagging has been applied to extract Noun Phrases from both abstracts and full texts. Sentiment analysis tries to classify references as being either positive or negative. LDA and Word2Vec are applied in topic detection. Named Entity Recognition enables the identification of main actors, applied methodologies or techniques and chemical compounds or proteins.   

Given the quite applied nature of the presentation, it might also be interesting to researchers active in innovation management but also for research coordinators and other staff working on research management.