JonasGeiping/cramming
Cramming the training of a (BERT-type) language model into limited compute.
PythonMIT
Issues
- 2
How to load local data
#46 opened by Doraemonzzz - 15
- 5
From PR 43
#44 opened by JonasGeiping - 2
Configs for GPT?
#41 opened by rexdu2003 - 3
Finetuning for token classification
#40 opened by druskacik - 12
- 7
- 1
TypeError: _load_optimizer() missing 1 required positional argument: 'initial_time'
#38 opened by vincent-163 - 2
try it on Mac M1 but failed
#36 opened by yangboz - 2
can't import cramming
#37 opened by RobinRojowiec - 2
Finetuning for SQuAD task
#35 opened by kisacats - 8
- 1
Question about sparse token prediction
#33 opened by leo-du - 5
Issue with torch.compile / dynamo
#32 opened by spencerfrei - 5
GLUE evaluation numbers are very poor, if increase the sequence length to 512 and float 32
#28 opened by tbaggu - 2
- 3
- 2
Errors with both the verify installation command as well as the final recipe
#27 opened by tatami-galaxy - 2
Pretraining on a single RTX 3060
#26 opened by TahaBinhuraib - 7
- 4
Cola dataset evaluation
#21 opened by TahaBinhuraib - 10
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)` while running evaluation
#24 opened by tbaggu - 9
- 2
Preprocessed files on S3/Google Drive
#13 opened by tals - 4
Verification command fails on macOS
#12 opened by laclouis5 - 5
Can't evaluate
#16 opened by TahaBinhuraib - 8
Reproduce the result when freezing parameters
#15 opened by sleeepeer - 5
Training Step Count
#11 opened by ekurtulus - 2
- 2
- 10
Preprocessing for final recipe
#2 opened by florianmai - 4
Storage space requirement
#6 opened by okpatil4u - 3
preprocessed c4 dataset?
#7 opened by w32zhong