MSc

Supporting material for my MSc dissertation, "Expanding the Open Wordnets for English and Portuguese to Geology Domain: Inclusion of Lythology and Geological Time Concepts". It is available at: http://bibliotecadigital.fgv.br/dspace/bitstream/handle/10438/29846/AlexandreTessarolloMSc.pdf

Abstract: Human knowledge has been stored, transferred and built upon by written means. The human ability to tap into this source is by far the main reason why we’ve been able to advance our collective understanding. Over a quarter century ago, our technologies for collecting, storing, and disseminating vast amounts of information had gotten ahead of our technologies for collating and analyzing it. Natural Language Processing (NLP) tackles this issue. The everyday life already benefits from NLP, with applications ranging from spam filtering to (limited) support chatbots and artificial intelligence assistants interacting through voice commands. When it comes to technical language, however, NLP has some shortcomings. This is particularly true for the Oil&Gas domain, where information is the most precious resource, one that supports decisions worth billions of dollars. Even though there are numerous reports, papers, documents and alike, such knowledge remains untapped due to NLP domain limitations. It is our hypothesis that expanding a lexical resource, namely the WordNet, will have a scalable effect particularly on Word Sense Disambiguation (WSD) and on the overall NLP for Oil&Gas domain documents. To verify this we extended the WordNet with 377 new concepts (synsets), 558 new lexical forms (words) and 948 new relations (pointers) between such word and/or synsets. Such extension is focused on two of the most common references mentioned in Oil&Gas documents: Geological Time and Lithology (branch of geology devoted to rocks). We perform such extension both “vertically” from the original Princeton WordNet in English into the Open WordNet for English (OWN-EN) and “horizontally” by translating and adapting such effort to the Open Word- Net for Portuguese (OWN-PT). We then compare the outputs of the WSD algorithm UKB before and after such extension. Both WordNet extensions (English and Portuguese) are available as online open-source initiatives.