IMD1107 - Natural Language Processing

Dr. Elias Jacob de Menezes Neto

This repository contains the code and resources for the course. The course covers a broad spectrum of topics in Natural Language Processing (NLP) and Generative Models, including foundational concepts, advanced architectures, and practical applications.

Course Topics

Please refer to the Syllabus for a detailed overview of the course topics and schedule.

Installation

Follow these steps to set up the environment and dependencies:

Download the Repository:

git clone https://github.com/eliasjacob/imd1107-nlp.git
cd imd1107-nlp

Run the Download Script:
```
bash download_datasets_and_binaries.sh
```
Install Ollama: Download and install Ollama from here.
Download LLama 3.1:
```
ollama pull llama3.1
```
Install Dependencies:

For GPU support:

poetry install --sync -E cuda --with cuda
poetry shell

For CPU-only support:

poetry install --sync -E cpu
poetry shell

Authenticate Weights & Biases:
```
wandb login
```

Using VS Code Dev Containers

This repository is configured to work with Visual Studio Code Dev Containers, providing a consistent and isolated development environment. To use this feature:

Install Visual Studio Code and the Remote - Containers extension.
Clone this repository to your local machine (if you haven't already):
```
git clone https://github.com/eliasjacob/imd1107-nlp.git
```
Open the cloned repository in VS Code.
When prompted, click "Reopen in Container" or use the command palette (F1) and select "Remote-Containers: Reopen in Container".
VS Code will build the Docker container and set up the development environment. This may take a few minutes the first time.
Once the container is built, you'll have a fully configured environment with all the necessary dependencies installed.

Using Dev Containers ensures that all course participants have the same development environment, regardless of their local setup. It also makes it easier to manage dependencies and avoid conflicts with other projects.

Getting Started

Once the environment is set up, you can start exploring the course materials, running code examples, and working on the practical exercises.

Notes

Some parts of the code may require a GPU for efficient execution. If you don't have access to a GPU, consider using Google Colab.

Teaching Approach

The course will use a top-down teaching method, which is different from the traditional bottom-up approach.

Top-Down Method: We'll start with a high-level overview and practical application, then delve into the underlying details as needed. This approach helps maintain motivation and provides a clearer picture of how different components fit together.
Bottom-Up Method: Typically involves learning individual components in isolation before combining them into more complex structures, which can sometimes lead to a fragmented understanding.

Example: Learning Baseball

Harvard Professor David Perkins, in his book Making Learning Whole, compares learning to playing baseball. Kids don't start by memorizing all the rules and technical details; they begin by playing the game and gradually learn the intricacies. Similarly, in this course, you'll start with practical applications and slowly uncover the theoretical aspects.

Important: Don't worry if you don't understand everything initially. Focus on what things do, not what they are.

Learning Methods

Doing: Engage in coding and building projects.
Explaining: Write about what you've learned or help others in the course.

You'll be encouraged to follow along with coding exercises and explain your learning to others. Summarizing key points as the course progresses will also be part of the learning process.

Contributing

Contributions to the course repository are welcome! Follow these steps to contribute:

Fork the repository.
Create a new branch:
```
git checkout -b feature/YourFeature
```
Make your changes.
Commit your changes:
```
git commit -m 'Add some feature'
```
Push to the branch:
```
git push origin feature/YourFeature
```
Create a Pull Request.

Contact

For any questions or feedback regarding the course materials or repository, you can contact me.