Fundamentals of NLP

(Work in Progress!)

Natural language processing (NLP) has made substantial advances in the past few years due to the success of modern techniques that are based on deep learning. With the rise of the popularity of NLP and the availability of different forms of large-scale data, it is now even more imperative to understand the inner workings of NLP techniques and concepts, from first principles, as they find their way into real-world usage and applications that affect society at large. Building intuitions and having a solid grasp of concepts are both important for coming up with innovative techniques, improving research, and building safe, human-centered AI and NLP technologies.

We introduce a new series called Fundamentals of NLP where we aim to teach about important NLP techniques and concepts starting from the first principles. We will introduce the theoretical aspect and motivation of each concept covered throughout the series. Then we will obtain hands-on experience by using bootstrap methods, industry-standard tools, and other open-source libraries to implement the different techniques. Along the way, we will also cover best practices, share important references, point out common mistakes to avoid when training and building NLP models, and discuss what lies ahead.

Join our Slack community to find our more about this and other ongoing projects or send me an email at ellfae@gmail.com and I will send you an invite.

Chapters

Chapter 1: Tokenization, Lemmatization, Stemming, and Sentence Segmentation -- Colab notebook, Web version

gayatrivenugopal/nlp_fundamentals

Fundamentals of NLP

Chapters