Advances in natural language processing (NLP) and deep learning techniques provide practitioners with an expanded set of options for document classification. This paper lever- ages recent research in this area, applying convolutional neural networks and BERT variants against a challenging real world dataset to evaluate how well these approaches perform against traditional machine learning approaches. We show that, for these data, state-of-the-art techniques can enjoy real advantages over more traditional techniques, but the effect is smaller than one might expect.
Each numbered file corresponds to a section in the report. Most files are divided into notebooks
, models
, and archive
folders. The notebooks
folder contains Jupyter notebooks that directly support the text. The archive
folders contain many notebooks that explore areas we chose not to write about. The archive
folder at the top level of the repo contains work that did not fit neatly into any section.