This repository contains a library that I use for my Natural language processing projects.
All the code in the library is based on Pytorch.
Most of the models in the library are built upon pretrained models from the sentence-transformers library,
which offers a wide variety of options for very performant sentence embeddings models, which is in turn based on the popular transformers library by Huggingface
- Scripts to train and test word-level and sentence-level embeddings models on various NLP tasks
- Wrappers around Huggingface pretrained model to perform experiments on text similarity tasks
- A semantic search pipeline built on top of performing sentence embedding models and approximate nearest neighbours algorithms
- A model compression pipeline that includes functions to distill, prune, quantize and convert models to optimized formats such as Onnx, Tensorflow Lite and Torchscript to use in edge devices
- Scripts to train models on a variety of text similarity and sequence classification tasks
- Sense-aware embeddings creation exploiting WordNet relations and contextualised embeddings
- PySpark integration for faster text preprocessing for larger datasets
Mirco Cardinale Personal website