Project: Understanding Transformers with Multi-Head Attention and PyTorch

The main aim of this project is to deepen my understanding of the Transformer model. I'll be focusing on two essential parts: multi-head attention (which lets the model focus on different parts of data) and positional encodings (which help the model know the positions of words in a sentence). All of this will be done using the PyTorch library.