allenai/OLMo

Modeling, training, eval, and inference code for OLMo

PythonApache-2.0

Issues

the loss spike
#560 opened 2 months ago by bpwl0121
6
why is the total_grad_norm increasing across training?
#596 opened 16 days ago by ryanyxw
5
OLMoThreadError
#552 opened 2 months ago by juripapay
4
OLMoThreadError
#591 opened 19 days ago by lecifire
1
is_causal=attention_bias is None
#598 opened 13 days ago by nkkbr
1
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
#563 opened a month ago by Jimmy-Yang1217
2
Default eos_token_id in `scripts/prepare-tulu-data.py`
#597 opened 11 days ago by y0mingzhang
1
Expose memmap_dtype in the data configuration
#595 opened 17 days ago by leon-g-xu
1
OLMo-1B's results seem very bad on olmo-eval
#583 opened 17 days ago by Ivan-Zhou
0
Problem with HF loading from model checkpoint
#586 opened 18 days ago by ryanyxw
5
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
#581 opened 18 days ago by mclanza
6
training directly from object storage?
#592 opened 19 days ago by joellliu
1
why customized optimizers?
#566 opened 19 days ago by joellliu
6
Gradient Checkpointing
#549 opened 2 months ago by fakerybakery
2
Problems with multi-epoch training
#584 opened 23 days ago by Muennighoff
0
PyTorch 2.3 Support for for OLMo
#578 opened a month ago by prakamya-mishra
0
Inference using llama.cpp
#571 opened a month ago by nopperl
0
Support DDP
#570 opened a month ago by Muennighoff
0
OLMo 7B finetuning w/ CPU offloading does not work
#478 opened 3 months ago by gahdritz
4
Clarification about weight tying mechanism in OLMo 1B: shared modules vs. shared weights
#485 opened a month ago by djliden
6
Shape mismatch error when resizing token embeddings in OLMo modeling code
#491 opened a month ago by djliden
2
I'm interested in OLMo-twin, but I found no more information except its name.
#479 opened a month ago by HuXinjing
3
Question about Pre-training Olmo 7B
#497 opened a month ago by michaellin99999
2
nan loss encountered
#498 opened a month ago by jiaxiaolei007
2
Training on a single GPU
#502 opened a month ago by ShenZhen0502
2
Why does training not stop after max_duration steps?
#474 opened a month ago by davidbrandfonbrener
2
scripts/prepare_tulu_data.py ERROR
#475 opened a month ago by Maxhyl
1
Break at 1 epoch "Training epoch complete", can't pretraining beyond 1 epoch ?
#554 opened 2 months ago by Xuekai-Zhu
3
No module named 'torch.distributed.device_mesh'
#559 opened 2 months ago by prakamya-mishra
6
Flash attention 2.0 support
#557 opened 2 months ago by johnhalloran321
3
How can I download the checkpoint
#519 opened 3 months ago by LianhaoXue
1
Flash Attention for AMD GPUs
#539 opened 2 months ago by prakamya-mishra
3
Checkpoints for instruction-tuned model
#553 opened 2 months ago by alexnikulkov
1
SFT and DPO Finetuning Code Available?
#545 opened 2 months ago by YilunZhou
3
Issue training on multiple nodes
#550 opened 2 months ago by edwardsp
2
Duplicate tokenizer entry in `config.json` `auto_map` section
#544 opened 2 months ago by djliden
1
Something weird with Instruct Model
#518 opened 3 months ago by y12uc231
1
IndexError in OLMo-7B pre-training dataset
#538 opened 2 months ago by Bread0288
0
how to train model base on v1_6-sample dataset on local trainset
#535 opened 2 months ago by scalaboy
1
1B generates whitespace after a specific amount of fine-tuning
#515 opened 2 months ago by KaiserWhoLearns
1
TypeError: forward() got an unexpected keyword argument 'cache_position'
#525 opened 2 months ago by sanderland
4
Question about optim.pt
#531 opened 2 months ago by xijiu9
0
Training stability without loss scaling
#524 opened 2 months ago by hwijeen
1
nan loss encountered
#506 opened 3 months ago by abrahamhwj
2
Question about the tokens/per second/GPU
#522 opened 3 months ago by P3ngLiu
0
prepare_memmap_dataset.py seems to use wrong eos_token_id for the tokenizer
#513 opened 3 months ago by wsonejoy
2
FSDPMixedPrecision setting, Logit norm growth, z-loss.
#514 opened 3 months ago by maximilianmbeck
0
could some one help why the tokens begin with G.
#505 opened 3 months ago by oras903
2
olmo-torch2-base does not exist in docker hub
#496 opened 3 months ago by Arvin-Hu
2
How can I finetune Olmo-7B using this repo?
#499 opened 3 months ago by rajasekharmekala
2