How to Read Tea Leaves: A hands-on Guide for Semantic Validation of Text Models using Oolong

📍 Online-only event

📆 May 05, 2021, 13:45-15:15

Abstract

The growing supply of unstructured text is a great chance, but also a challenge for social science. In many instances we want to classify, scale or compare text for which no prelabeled data is available. In this case, unsupervised learning techniques such as topic models or the use of dictionaries promise the automated analysis of text with little or no human input. But these models are notoriously difficult to evaluate. While the validation of statistical properties of topics models is well established, the substantive meaning of categories uncovered is often less clear and their interpretation reliant on "intuition" or "eyeballing". Computer science scholars rather call it "reading tea leaves". The story for dictionary-based methods is not better. Researchers usually assume these dictionaries have built-in validity and use them directly in their research. Oolong provides a set of tools to objectively judge substantive interpretability to applied users in disciplines such as political science and communication science. It allows standardized content based testing of topic models as well as dictionary-based methods with clear numeric indicators of semantic validity. This session is a hand-on guide on how to create and administer your own tests.

Presenter

Marius Sältzer is a doctoral researcher in political science at the University of Mannheim. His research revolves around the dimensions of political conflict, e.g., the questions what issues matter for the public, political parties and their constituencies. To answer these questions, he studies political communication of legislators, parties and other key political actors, with a special emphasis on political elites' use of social media.