Companion repository to Paul Azunre's "Transfer Learning for Natural Language Processing" Book
Please note that this version of the repo follows a recent significant reordering of chapters. If you are looking for the original outdated ordering used during most of MEAP, please refer to this repo version
Watch the following intro video first!
Rendered Jupyter notebooks are organized in folders by Chapter in this repo, with each folder containing a corresponding kaggle_image_requirements.txt
file representing Kaggle docker image pip dependency dump at the time of their latest succesful run by the author.
Please note that this requirements file is for the purpose of documenting the environment on Kaggle on which the results reported in the book were achieved. If you try to configure an environment different from Kaggle -- say your local Apple machine -- by pip installing directly from this file, you will likely run into errors. In that case, this requirements file should only be used as a guide, and you can't expect it to work straight out of the box, due to many potential architecture-specific dependency conflicts.
MOST OF THESE REQUIREMENTS WOULD NOT BE NECESSARY FOR LOCAL INSTALLATION, and this isn't a usage mode we support -- even if it is certainly a good "real-world" skill-building exercise to go through. If you do decide to work locally, we recommend Anaconda.
Ideally, run these notebooks directly on Kaggle, where notebooks are already hosted. This will require little to no setup on your part. Be sure to hit Copy and Edit Kernel
at the top right of each Kaggle kernel page (after creating an account) to get going right away.
Please make sure to "COPY AND EDIT NOTEBOOK" on Kaggle, i.e., fork it, to use compatible library dependencies! DO NOT CREATE A NEW NOTEBOOK AND COPY+PASTE THE CODE - this will use the latest pre-installed Kaggle dependencies at the time you create the new notebook, and the code will need to be modified to make it work. Also make sure internet connectivity is enabled on your notebook. See Appendix A of book for a tutorial introduction to various features of Kaggle.
The following is a list of notebooks that have been hosted, their corresponding Chapters and Kaggle links.
Chapter(s) | Description | Kaggle link |
---|---|---|
2-3 | Linear & Tree-based models for Email Sentiment Classification | https://www.kaggle.com/azunre/tlfornlp-chapters2-3-imdb-traditional |
2-3 | Linear & Tree-based models for IMDB Movie Review Classification | https://www.kaggle.com/azunre/tlfornlp-chapters2-3-spam-traditional |
2-3 | ELMo for Email Semantic Classification | https://www.kaggle.com/azunre/tlfornlp-chapters2-3-spam-elmo |
2-3 | ELMo for IMDB Movie Review Classification | https://www.kaggle.com/azunre/tlfornlp-chapters2-3-imdb-elmo |
2-3 | BERT for Email Semantic Classification | https://www.kaggle.com/azunre/tlfornlp-chapters2-3-spam-bert |
2-3 | Tensorflow 2 BERT for Email Semantic Classification | https://www.kaggle.com/azunre/tlfornlp-chapters2-3-spam-bert-tf2 |
2-3 | BERT for IMDB Movie Review Classification | https://www.kaggle.com/azunre/tlfornlp-chapters2-3-imdb-bert |
4 | IMDB Review Classification with word2vec and FastText | https://www.kaggle.com/azunre/tlfornlp-chapter4-imdb-word-embeddings |
4 | IMDB Review Classification with sent2vec | https://www.kaggle.com/azunre/tlfornlp-chapter4-imdb-sentence-embeddings |
4 | Dual Task Learning of IMDB and spam detection | https://www.kaggle.com/azunre/tlfornlp-chapter4-multi-task-learning |
4 | Domain adaptation of IMDB classifier to new domain of Book Review Classification | https://www.kaggle.com/azunre/tlfornlp-chapter4-domain-adaptation |
5-6 | Using SIMOn for column data type classification on baseball and BC library OpenML datasets | https://www.kaggle.com/azunre/tlfornlp-chapters5-6-column-type-classification |
5-6 | Using ELMo for "fake news" detection/classification | https://www.kaggle.com/azunre/tlfornlp-chapters5-6-fake-news-elmo |
7 | Transformer - self-attention visualization and English-Twi translation example with transformers library | https://www.kaggle.com/azunre/tlfornlp-chapter7-transformer |
7 | GPT and DialoGPT in transformers library, application to chatbot and open-ended text generation | https://www.kaggle.com/azunre/tlfornlp-chapter7-gpt |
7, 11 | GPT-Neo (aims to be open GPT-3 "equivalent") in transformers library | https://www.kaggle.com/azunre/tlfornlp-chapter7-gpt-neo |
8 | Applying BERT to filling-in-the-blanks and next sentence prediction | https://www.kaggle.com/azunre/tlfornlp-chapter8-bert |
8 | Fine-tuning mBERT on monolingual Twi data with pre-trained tokenizer | https://www.kaggle.com/azunre/tlfornlp-chapter8-mbert-tokenizer-fine-tuned |
8 | Fine-tuning mBERT on monolingual Twi data with tokenizer trained from scratch | https://www.kaggle.com/azunre/tlfornlp-chapter8-mbert-tokenizer-from-scratch |
9 | Implementing ULMFiT adaptation strategies with fast.ai v1 | https://www.kaggle.com/azunre/tlfornlp-chapter9-ulmfit-adaptation |
9 | Implementing ULMFiT adaptation strategies with fast.ai v2 | https://www.kaggle.com/azunre/tlfornlp-chapter9-ulmfit-adaptation-fast-aiv2 |
9 | Demonstrating advantages of Knowledge Distillation, i.e., DistilBERT, with Hugging Face Transformers library | https://www.kaggle.com/azunre/tlfornlp-chapter9-distillation-adaptation |
10 | Demonstrating advantages of cross-layer parameter sharing and embedding factorization, i.e., ALBERT, with Hugging Face Transformers | https://www.kaggle.com/azunre/tlfornlp-chapter10-albert-adaptation |
10 | Fine-tuning BERT on a single GLUE task of measuring sentence similarity, i.e., STS-B | https://www.kaggle.com/azunre/tlfornlp-chapter10-multi-task-single-glue |
10 | Fine-tuning BERT on a multiple GLUE tasks (sequential adaptation): first to a data-rich question similarity scenario (QQP), followed by adaptation to a sentence similarity scenario (STS-B) | https://www.kaggle.com/azunre/tlfornlp-chapter10-multi-task-glue-sequential |
10 | Using pretrained adapter modules with AdapterHub instead of fine-tuning | https://www.kaggle.com/azunre/tlfornlp-chapter10-adapters |
Appendix B | Using Tensorflow V1, V2 and PyTorch to compute a function and its gradient via automatic differentiation | https://www.kaggle.com/azunre/tl-for-nlp-appendixb |
To reiterate, just hit Copy and Edit Kernel
at the top right of each Kaggle kernel page (after creating an account) to get going right away. Note that for GPU enabled notebooks, your FREE Kaggle GPU time is limited (to 30-40 hours/week in 2020, with the clock resetting at the end of each Friday). Be cautious and shut such notebooks down when not needed, when debugging non-GPU critical parts of the code, etc.
Kaggle frequently updates the dependencies, i.e., versions of the installed libraries on their docker images. To ensure that you are using the same dependencies as we did when we wrote the code – so that the code works with minimal changes out-of-the-box – please make sure to select Copy and Edit Kernel
for each notebook of interest. If you copy and paste the code into a new notebook and don’t follow this recommended process, you may need to adapt the code slightly for the specific library versions installed for that notebook at the time you created it.
This also applies if you elect to install a local environment. For local installation, pay attention to the frozen dependency requirement list we have shared in the companion repo, which will guide you on which versions of libraries you will need. Moreover, most of the requirements will not be necessary for local installation.
Finally, please note that because ELMo has not yet been ported to Tensorflow 2.x at the time of this writing, we are forced to use Tensorflow 1.x to compare it fairly to BERT. To confirm whether that is still the case you can check the following link. We have however provided a notebook illustrating of how to use BERT with Tensorflow 2.x for the spam classification example. We transition in later chapters from Tensorflow and Keras to the Hugging Face transformers library. That library uses latest dependendies in the backend, including Tensorflow 2.x if you prefer it over PyTorch. You could view the exercise in Chapters 2 and 3 as a historical record of and experience with early packages that were developed for NLP transfer learning. This exercise simultaneously helps you juxtapose Tensorflow 1.x with 2.x. Appendix B of the book will also help with this by outlining major differences between Tensorflow 1.x and 2.x.