Issues
- 6
the loss spike
#560 opened by bpwl0121 - 5
why is the total_grad_norm increasing across training?
#596 opened by ryanyxw - 4
OLMoThreadError
#552 opened by juripapay - 1
OLMoThreadError
#591 opened by lecifire - 1
is_causal=attention_bias is None
#598 opened by nkkbr - 2
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
#563 opened by Jimmy-Yang1217 - 1
- 1
Expose memmap_dtype in the data configuration
#595 opened by leon-g-xu - 0
OLMo-1B's results seem very bad on olmo-eval
#583 opened by Ivan-Zhou - 5
Problem with HF loading from model checkpoint
#586 opened by ryanyxw - 6
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
#581 opened by mclanza - 1
training directly from object storage?
#592 opened by joellliu - 6
why customized optimizers?
#566 opened by joellliu - 2
Gradient Checkpointing
#549 opened by fakerybakery - 0
Problems with multi-epoch training
#584 opened by Muennighoff - 0
PyTorch 2.3 Support for for OLMo
#578 opened by prakamya-mishra - 0
Inference using llama.cpp
#571 opened by nopperl - 0
Support DDP
#570 opened by Muennighoff - 4
OLMo 7B finetuning w/ CPU offloading does not work
#478 opened by gahdritz - 6
Clarification about weight tying mechanism in OLMo 1B: shared modules vs. shared weights
#485 opened by djliden - 2
- 3
I'm interested in OLMo-twin, but I found no more information except its name.
#479 opened by HuXinjing - 2
Question about Pre-training Olmo 7B
#497 opened by michaellin99999 - 2
nan loss encountered
#498 opened by jiaxiaolei007 - 2
Training on a single GPU
#502 opened by ShenZhen0502 - 2
- 1
scripts/prepare_tulu_data.py ERROR
#475 opened by Maxhyl - 3
Break at 1 epoch "Training epoch complete", can't pretraining beyond 1 epoch ?
#554 opened by Xuekai-Zhu - 6
- 3
Flash attention 2.0 support
#557 opened by johnhalloran321 - 1
How can I download the checkpoint
#519 opened by LianhaoXue - 3
Flash Attention for AMD GPUs
#539 opened by prakamya-mishra - 1
Checkpoints for instruction-tuned model
#553 opened by alexnikulkov - 3
SFT and DPO Finetuning Code Available?
#545 opened by YilunZhou - 2
Issue training on multiple nodes
#550 opened by edwardsp - 1
- 1
Something weird with Instruct Model
#518 opened by y12uc231 - 0
IndexError in OLMo-7B pre-training dataset
#538 opened by Bread0288 - 1
- 1
- 4
- 0
Question about optim.pt
#531 opened by xijiu9 - 1
Training stability without loss scaling
#524 opened by hwijeen - 2
nan loss encountered
#506 opened by abrahamhwj - 0
Question about the tokens/per second/GPU
#522 opened by P3ngLiu - 2
- 0
- 2
could some one help why the tokens begin with G.
#505 opened by oras903 - 2
olmo-torch2-base does not exist in docker hub
#496 opened by Arvin-Hu - 2
How can I finetune Olmo-7B using this repo?
#499 opened by rajasekharmekala