Teresa Lynn - Developing the Irish Dependency Treebank

Organized by: 
Lorraine Goeuriot
Dr Teresa Lynn, ADAPT Research Centre, Dublin City University


Dr. Teresa Lynn is a Research Fellow at the ADAPT Research Centre in Dublin City University. Her PhD was awarded under a cotutelle agreement between Dublin City University and Macquarie University, Sydney in 2016. Teresa's main interests lie in developing tools and resources for Irish language technology. She is the principal investigator on the GaelTech project, which covers various research topics such as Irish treebank and parser development, exploring techniques for parsing Irish multiword expressions and the processing of Irish language text on social media.  Other project activities include the ELRC (European Language Resource Coordination), ELRI (European Language Resource Infrastructure) and the Universal Dependencies Project. Teresa also worked in industry for several years, namely in the areas of localisation, NLP and machine translation.


Statistical parsers are data-driven and rely on the availability of syntactically annotated corpora (treebanks) from which they learn patterns of syntax in a given language. Treebanks are costly in both terms of development time and skills required. For this reason, low-resourced languages often lack both treebanks and statistical parsers.
In this talk I will report on the development of the first Irish dependency treebank and syntactic parser. I will discuss the linguistic structures of the Irish language (a low-resourced language), and the motivation behind the design of the final dependency annotation scheme. I will also demonstrate how we examined methods such as Active Learning to semi-automate the treebank development. Through empirical methods, we see the impact our treebank's size and content has on parsing accuracy for Irish.  I will also briefly discuss our involvement in the Universal Dependencies Project.