/ODSC-APAC-2021-Tutorial

NLP without a readymade dataset.

Primary LanguageHTMLCreative Commons Zero v1.0 UniversalCC0-1.0

How to do NLP When You Don’t Have a Labeled Dataset?

This is the Github repository for the ODSC-APAC 2021 tutorial session " How to do NLP When You Don’t Have a Labeled Dataset?"

An overview on this topic can be found in the ODSC blog, from August 2021.

Abstract:
Lack of a readily available dataset is a commonly seen scenario in industry projects involving NLP. It is also a situation researchers venturing into new problems or new languages often encounter. However, both traditional textbooks, as well as tutorials and workshops primarily focus on modeling and deploying models. In this workshop, I will introduce some strategies to create labeled datasets for a new task and build your first models with that data. At the end of this session, the participants are expected to get some ideas for solving the data bottleneck in their organization. The target audience are data scientists as well as those involved in requirements gathering for a given NLP problem

What is what in this repo:

LICENSE: CC0-1.0 License README.md: This file.

requirements.txt file - generated using pipreqs.