rasbt/LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Jupyter NotebookNOASSERTION
Issues
- 3
Missing line of code if not mistaken
#447 opened by AldawsariNLP - 3
Bug in Exercise 5.6?
#444 opened by SeriousJ55 - 1
Improvement idea: MHA `d_out`
#443 opened by d-kleine - 1
Small issue in notebook ch06.ipynb
#433 opened by zia-hasan - 1
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! in ch05/01_main-chapter-code/gpt_generate.py
#435 opened by xfpg21421 - 1
- 13
- 2
Dropout - activated value
#428 opened by d-kleine - 2
[typo] APPENDIX D.1 Learning rate warmup
#424 opened by casinca - 6
Llama 3.2 standalone
#418 opened by d-kleine - 8
RoPE `inv_freq` code
#410 opened by d-kleine - 0
Toy example: Train on a dataset
#415 opened by Iosifts - 3
RoPE - compute_rope mismatch of tensor dimensions
#411 opened by rkinas - 1
- 7
When I tried to fine-tune the llama3-8b into a classifier, there was a problem
#392 opened by YinSonglin1997 - 3
LLama 3.2 1B model
#387 opened by d-kleine - 1
help or not???????????
#390 opened by DanielRojas20 - 1
raw_text[:99] is 99 characters, not 100, on page 22.
#379 opened by verhas - 2
Minor: RLHF figure typos
#377 opened by d-kleine - 5
- 2
Section 2.6 (41) RuntimeError
#365 opened by lmw4051 - 2
Wrong supported context length
#371 opened by Safarveisi - 2
- 1
- 1
- 1
- 3
Reflection Finetuning
#350 opened by d-kleine - 1
- 5
GitHub's image rendering issue
#338 opened by rasbt - 0
Google colab file related
#342 opened by athul-22 - 4
The GPT 2 Model url link needs to be corrected
#272 opened by satyamedidi - 0
Any plan to introduce kv cache or something alike?
#331 opened by npuichigo - 1
Duplicated code cell (Exercise 5.3: Deterministic behavior in the decoding functions)
#328 opened by labdmitriy - 1
Different function names for the same function (5.3.3 Modifying the text generation function)
#327 opened by labdmitriy - 3
- 1
- 4
- 4
- 1
An unusual link in the pdf version (5.1.2 Calculating the text generation loss)
#310 opened by labdmitriy - 1
Different figures in the book and jupyter notebook for Figure 5.9 (5.1.3 Calculating the training and validation set losses)
#311 opened by labdmitriy - 1
- 3
Edge case: Gradient accumulation
#299 opened by d-kleine - 1
Several typos/questions (Sections 4.1-4.2)
#296 opened by labdmitriy - 0
lower validation&train loss with poorer performance
#292 opened by TITC - 3
generate_model_scores function bug?
#286 opened by gcapuzzi - 1
Question of training on conversational datasets
#283 opened by qibin0506 - 3
Inconsistency in cases for applying dropout in the book and in the notebook (3.5.2 Masking additional attention weights with dropout)
#280 opened by labdmitriy - 2
(Enhancement) Applying mask to attention in one operation (3.5 Hiding future words with causal attention)
#279 opened by labdmitriy - 3
Detaching a Tensor from the graph doesn't have effect (Embedding Layers and Linear Layers)
#277 opened by labdmitriy - 1
Duplicated line in the Listing 2.6 (2.6 Data sampling with a sliding window)
#275 opened by labdmitriy