Text tagging

Introduction

This project reviews the standard methods in text tagging and experiments extending the approach proposed in Universal Language Model Fine-tuning for Text Classification (ULMFiT) while integrating the modifications in a local copy of the FastAI library.

Index

The files/folder contained in this repo are:

fastai/ directory: Contains the modified verion 1.0.31 of the fastai library to inlcude text tagging.
ULMFiT_approach: A notebook with an execution of the Labeler (on working results) and some of the functions integrated in the library.
Data_preprocessing_visualization_new.ipynb: A notebook with Data preprocessing, visualization for the presentation
final_project_checkin_template.ipynb: first machine learning model fitting
baseline_optimization.ipynb: A notebook with grid search and pipeline to tune the machine learning algorithms

The extension of the approach proposed in ULMiT to this task is still an ongoing project. While a working version has been constructed the models results still need to be improved.

Major issues

While developing the application of ULMFit to text tagging we realized a major issue of using pre-defined models for that task. This is, the tokenization of the up-stream task, which generally is used for several down-stream tasks, needs to match the one that was provided in the down-stream taks for the text to match the labels.

Authors

Miguel Romero, Louise Lai, Jenny Kong.

r0mer0m/text-tagging

Text tagging

Introduction

Index

Major issues

Authors