sangmichaelxie/doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

HTMLMIT

Issues

Question about Group DRO implementation
#33 opened 3 months ago by NicholasCorrado
0
Request for Redpajama Dataset Weights
#32 opened 5 months ago by desomeboy
0
AssertionError：assert q.dtype in [torch.float16, torch.bfloat16]
#31 opened 6 months ago by Richard-Wth
2
Question about the initialization of the perdomain_scores
#26 opened 6 months ago by yuzc19
1
Question about 8B model architecture
#28 opened 6 months ago by Qinghao-Hu
1
ModuleNotFoundError: No module named 'flash_attn.models.falcon'
#22 opened a year ago by Sniper970119
11
Cuda version problem
#27 opened a year ago by RRaphaell
2
program stuck (when ”Loading cached shuffled indices for dataset at ...“)
#29 opened 9 months ago by ccx06
3
Question about model initialization
#30 opened 8 months ago by MAxx8371
0
Cannot reproduce the results shown in Github repo with the 120M reference model on A800 (8*80G).
#20 opened a year ago by kiseliu
17
Questions about the loss used for optimizing the proxy model
#25 opened a year ago by clarkkent0618
3
List of pinned requirements / Dockerfile?
#19 opened a year ago by filipg7777
2
Speed decrease during training
#24 opened a year ago by ljb121002
1
Questions about directly applying the weights from paper or the repo to train main model
#23 opened a year ago by clarkkent0618
2
Edge Case Discussion
#21 opened a year ago by thangld201
1
Question about optimized weights in the paper
#18 opened a year ago by yuzc19
4
Training time for baseline model and proxy model
#17 opened a year ago by yuzc19
1
question about only updating the domain weights on process 0
#8 opened a year ago by SueJane
4
How many rounds do we need to converge domain weights on The Pile?
#15 opened a year ago by ouyangliqi
1
easy HF dataset doremi?
#10 opened a year ago by brando90
2
How do you get the model to be good at code if it downsamples code?
#13 opened a year ago by teknium1
1
Should reference model initialize weights uniformly?
#11 opened a year ago by ouyangliqi
3
loss computation wrong?
#9 opened a year ago by tt6746690
2
Question about Flash-attention version.
#12 opened a year ago by kiseliu
1
Domain weights are mostly near one-hot
#5 opened a year ago by xiamengzhou
3
question about domain weights initialization value in paper fingure 8
#7 opened a year ago by Haijunlv
1
Multi-nodes support
#6 opened a year ago by binxuan
1
about loss
#3 opened 2 years ago by ywb2018
1
step 1 baseline_280M loss large
#1 opened 2 years ago by gawei1995
5
Adding a license
#2 opened 2 years ago by virtualzx-nad
1