RIPPED is a framework for bootstrapping natural language understanding in data-poor domains. It uses distance computations between pretrained sentence embeddings as a means to propagate the few labels we have through unlabeled space. This provides much higher accuracy in challenging classification domains, in particular those that are many-class, full of domain-specific language, or containing less than 10 labeled examples per class.
This repository contains the code used in writing my honors thesis.
Complete README COMING SOON...