rasbt/LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter NotebookNOASSERTION

Issues

ch05/03_bonus_pretraining_on_gutenberg prepare_dataset.py
#453 opened 19 days ago by bpalmer7440
0
Missing line of code if not mistaken
#447 opened a month ago by AldawsariNLP
3
Bug in Exercise 5.6?
#444 opened a month ago by SeriousJ55
3
Improvement idea: MHA `d_out`
#443 opened a month ago by d-kleine
1
Small issue in notebook ch06.ipynb
#433 opened 2 months ago by zia-hasan
1
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! in ch05/01_main-chapter-code/gpt_generate.py
#435 opened 2 months ago by xfpg21421
1
Incorrect explanation of the scaling of the non-dropped values in the mask
#431 opened 2 months ago by chsharma27
1
(Section 2.2) - Little difference from Book and Code Regex
#363 opened 3 months ago by gustavomr
13
Dropout - activated value
#428 opened 2 months ago by d-kleine
2
[typo] APPENDIX D.1 Learning rate warmup
#424 opened 2 months ago by casinca
2
Llama 3.2 standalone
#418 opened 2 months ago by d-kleine
6
RoPE `inv_freq` code
#410 opened 2 months ago by d-kleine
8
Toy example: Train on a dataset
#415 opened 2 months ago by Iosifts
0
RoPE - compute_rope mismatch of tensor dimensions
#411 opened 2 months ago by rkinas
3
Best practices for memory efficient weight loading tutorial
#402 opened 3 months ago by mikaylagawarecki
1
When I tried to fine-tune the llama3-8b into a classifier, there was a problem
#392 opened 3 months ago by YinSonglin1997
7
LLama 3.2 1B model
#387 opened 3 months ago by d-kleine
3
help or not???????????
#390 opened 3 months ago by DanielRojas20
1
raw_text[:99] is 99 characters, not 100, on page 22.
#379 opened 3 months ago by verhas
1
Minor: RLHF figure typos
#377 opened 3 months ago by d-kleine
2
Why does this page(dpo-from-scratch.ipynb) keep flashing?
#374 opened 3 months ago by haiduo
5
Section 2.6 (41) RuntimeError
#365 opened 3 months ago by lmw4051
2
Wrong supported context length
#371 opened 3 months ago by Safarveisi
2
Minor: 1.3 Stages of building and using LLMs, Fig. 1.2
#364 opened 3 months ago by d-kleine
2
[Typo][Minor] typo in "A.9.3 Training with multiple GPUs"
#359 opened 3 months ago by casinca
1
PackageNotFoundError: No package metadata was found for torch
#357 opened 3 months ago by PSLHimaBindu
1
Local setup: Suggestions to improve section about conda
#348 opened 4 months ago by navneethc
1
Reflection Finetuning
#350 opened 4 months ago by d-kleine
3
Misspelled words 'caries out' in embeddings-and-linear-layers.ipynb
#347 opened 4 months ago by yuyi2439
1
GitHub's image rendering issue
#338 opened 4 months ago by rasbt
5
Google colab file related
#342 opened 4 months ago by athul-22
0
Any plan to introduce kv cache or something alike?
#331 opened 4 months ago by npuichigo
0
Duplicated code cell (Exercise 5.3: Deterministic behavior in the decoding functions)
#328 opened 4 months ago by labdmitriy
1
Different function names for the same function (5.3.3 Modifying the text generation function)
#327 opened 4 months ago by labdmitriy
1
Incorrect formatting of the text as code (5.3.1 Temperature scaling)
#317 opened 5 months ago by labdmitriy
3
Output and code cells are in the wrong order (5.3.1 Temperature scaling)
#316 opened 5 months ago by labdmitriy
1
Missing word in a sentence (5.3.1 Temperature scaling)
#315 opened 5 months ago by labdmitriy
4
Inconsistencies in the book and Jupyter notebook (5.2 Training an LLM)
#312 opened 5 months ago by labdmitriy
4
An unusual link in the pdf version (5.1.2 Calculating the text generation loss)
#310 opened 5 months ago by labdmitriy
1
Different figures in the book and jupyter notebook for Figure 5.9 (5.1.3 Calculating the training and validation set losses)
#311 opened 5 months ago by labdmitriy
1
Typo of figure labeling? (5.1.2 Calculating the text generation loss)
#309 opened 5 months ago by labdmitriy
1
Edge case: Gradient accumulation
#299 opened 5 months ago by d-kleine
3
Several typos/questions (Sections 4.1-4.2)
#296 opened 5 months ago by labdmitriy
1
lower validation&train loss with poorer performance
#292 opened 5 months ago by TITC
0
generate_model_scores function bug?
#286 opened 5 months ago by gcapuzzi
3
Question of training on conversational datasets
#283 opened 5 months ago by qibin0506
1
Inconsistency in cases for applying dropout in the book and in the notebook (3.5.2 Masking additional attention weights with dropout)
#280 opened 5 months ago by labdmitriy
3
(Enhancement) Applying mask to attention in one operation (3.5 Hiding future words with causal attention)
#279 opened 5 months ago by labdmitriy
2
Detaching a Tensor from the graph doesn't have effect (Embedding Layers and Linear Layers)
#277 opened 5 months ago by labdmitriy
3
Duplicated line in the Listing 2.6 (2.6 Data sampling with a sliding window)
#275 opened 5 months ago by labdmitriy
1