mlfoundations/open_lm

A repository for research on medium sized language models.

PythonMIT

Issues

How to use multi source manifest data with different size balancedly?
#306 opened 3 months ago by lqniunjunlper
0
Multi-node training
#305 opened 4 months ago by LeoXinhaoLee
1
NotImplementedError running HF model "mlfoundations/dclm-7b-it" for inference
#303 opened 4 months ago by neginraoof
1
How to pretrain on DCLM-BASELINE
#304 opened 4 months ago by mathfinder
0
Webdataset version issue
#301 opened 4 months ago by GeorgiosSmyrnis
0
I got an error in open lm installation
#297 opened 5 months ago by orhanerday
2
Fine-Tuned Models for open_lm
#296 opened 5 months ago by OLMResearch
0
composer ICL metrics deprecated
#288 opened 5 months ago by ysharma1126
1
Remote Sync FSSPEC cannot upload large checkpoints
#279 opened 7 months ago by Skylion007
0
"Number of shards requested for a single epoch is more than the number of shards available" in the middle of a training run
#189 opened 7 months ago by afang-story
4
xfomers installation failed
#267 opened 7 months ago by stevensf1998
6
Reduce logging when --torchcompile is passed
#261 opened 7 months ago by achalddave
0
samples_per_second_per_gpu or tokens_per_second_per_gpu?
#262 opened 8 months ago by Muennighoff
1
Support `pretrained` arg for create_model (and train.py) like in open_clip
#135 opened 8 months ago by iejMac
1
Support attention masking to prevent attention across EOT tokens
#206 opened 8 months ago by achalddave
5
MoE performs worse than equivalent dense model?
#253 opened 8 months ago by Muennighoff
3
Make torch.compile work with fsdp and xformers
#72 opened 8 months ago by sagadre
6
Fix tokenize shuffle issues (speed + correctness)
#212 opened 8 months ago by Vaishaal
1
MoE Expert parallelism config
#251 opened 8 months ago by Muennighoff
0
Someone is using your project to sell it as a token
#247 opened 8 months ago by yzthink
1
Import from attention.py error
#202 opened a year ago by sedrick-keh-tri
3
Support user specified token pre-processing functions
#194 opened a year ago by sagadre
0
Factorize helper function for all model loading
#181 opened a year ago by sagadre
0
Use distributed when world_size=1 if requested
#170 opened a year ago by achalddave
0
grad accum tests failing on gpu w/ amp_bf16 precision
#171 opened a year ago by sagadre
0
`--delete-previous-checkpoint` should delete prev checkpoints in `--remote-sync` bucket
#166 opened a year ago by sagadre
0
Error early if we don't have enough disk space
#154 opened a year ago by achalddave
0
Undefined argument "moe_freq" when running unit tests on WSL/Ubuntu 20.04
#168 opened a year ago by aaronsm
1
Deduplicate argparse namespace creation for tests
#156 opened a year ago by achalddave
0
Black format tests, change CI to check test formatting
#146 opened a year ago by achalddave
0
Factor out parameter error checking
#107 opened a year ago by sagadre
2
HF Integration
#89 opened a year ago by sedrick-keh-tri
3
Add test for checkpoint loading after save
#145 opened a year ago by achalddave
3
Figure out why AdamW + gradient accumulation leads to different results for test case
#126 opened a year ago by achalddave
6
Minimize how often we load args.resume
#71 opened a year ago by achalddave
1
Investigate effect of FSDP policies on mamba speed
#144 opened a year ago by sagadre
0
Support saving/loading models larger than CPU memory
#130 opened a year ago by achalddave
0
Check if batch size changed when resuming.
#67 opened a year ago by GeorgiosSmyrnis
1
Improve dataloading.
#70 opened a year ago by GeorgiosSmyrnis
2
Move dummy cred download into test
#121 opened a year ago by achalddave
0
clean up model_configs directory
#116 opened a year ago by kernelmachine
0
error checking params.py
#95 opened a year ago by sagadre
1
Add venv cache for CI to avoid installing dependencies
#104 opened a year ago by achalddave
0
Dataloading Epoch Update Bug
#93 opened a year ago by sedrick-keh-tri
2
open_lm chronicles
#90 opened a year ago by iejMac
0
Wrong token count when using --accurate-total-tokens.
#65 opened a year ago by GeorgiosSmyrnis
0
Use no_sync when doing gradient accumulation
#48 opened a year ago by achalddave
0
Standardize tokenization for json and txt datasets
#56 opened a year ago by sagadre
0
Tokenization on-the-fly without slowdown
#55 opened a year ago by sagadre
0
llama2 unit tests
#52 opened a year ago by sagadre
0