mosaicml/llm-foundry

LLM training code for Databricks foundation models

PythonApache-2.0

Issues

Finetuning dataloader does not respect shuffle seed
#1676 opened 21 days ago by janEbert
2
Failing to build in Docker
#1678 opened a month ago by swood
0
ERROR:composer.cli.launcher:Global rank 0 (PID 208865) exited with code -11
#1501 opened 4 months ago by AndrewHYC
1
Setting Dropout in MPT Prefix-LM after Exporting to HuggingFace Crashes during Fine-tuning
#1046 opened 2 months ago by timsteuer
3
Not able to install Transformer Engine for fp8
#1526 opened 2 months ago by palash04
2
How to evaluate model using multi-gpu?
#1541 opened 2 months ago by lqniunjunlper
1
Bug in convert_dataset_hf
#1575 opened 2 months ago by eitanturok
2
Inquiry Regarding Benchmark Testing Environment and Parameters
#1584 opened 2 months ago by yhcho87
1
loss.detach().clone().mean() * (microbatch_size / current_batch_size
#1596 opened 2 months ago by YixinSong-e
4
MI300X Compatibility
#1568 opened 3 months ago by nikhil-tensorwave
1
Fine-tuning error in conda environment without docker image
#1538 opened 3 months ago by LalchandPandia
7
Triviaqa metrics wrong!
#1557 opened 3 months ago by lqniunjunlper
1
When Finetuning Llama3, Error occurs
#1508 opened 4 months ago by AndrewHYC
1
Converting checkpoints to HF post surgery Algos
#1492 opened 4 months ago by Extirpater
1
How to continue pretrain LLM fp8 with hf_causal_lm
#1261 opened 5 months ago by YixinSong-e
2
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
#1484 opened 4 months ago by eldarkurtic
1
Training cannot start
#1469 opened 4 months ago by yuezih
2
Fill in the middle
#1264 opened 4 months ago by germanjke
2
Update StreamingTextDataset to support truncation with multiple truncated items out.
#1363 opened 4 months ago by LingxiaoShawn
2
profiler issue
#1448 opened 4 months ago by germanjke
1
Sequence packing for MPT/DBRX allows for cross-document attention ?
#1464 opened 4 months ago by eldarkurtic
1
Opt-3b Pretrain YAML config failing with mosaicml/llm-foundry/2.2.1_cu121_flash2-4aef5de docker
#1141 opened 8 months ago by bhavnicksm
2
Installation issue from habana_alpha branch
#1090 opened 5 months ago by palash04
2
Any example script to run multi-node training for slurm?
#1378 opened 5 months ago by wavy-jung
7
LLaMA PRO training resume problem
#1231 opened 7 months ago by germanjke
6
Train with attention mask
#1183 opened 8 months ago by germanjke
1
MPT training with ALiBi and Flash Attention 2
#1289 opened 6 months ago by rickgit16
4
Composer lora weights conversion
#1325 opened 6 months ago by zhao-lun
3
Unable to use self developed pre-trained model for finetuning in MosaicML
#1291 opened 6 months ago by sauravgrd
1
Allow multiprocessing when preparing ICL dataset
#1276 opened 6 months ago by sanjari-orb
8
Why is there a warmup in hf_generate.py?
#1271 opened 6 months ago by palash04
1
Managing Timeout on Training Errors and Simultaneous Restart of All Nodes in LLM Foundry
#1272 opened 6 months ago by germanjke
1
could you give an elaborated steps about how to run llm-foundry on AMD mi250 devices
#1242 opened 7 months ago by Alice1069
1
Finetuning does not work on nightly
#1221 opened 7 months ago by eldarkurtic
2
Conversion Sharded -> Monolithic checkpoint
#1220 opened 7 months ago by pretidav
1
Add State Space Models / Mamba Layer Support
#1174 opened 7 months ago by devin-ai-integration
0
Issue when installing "pip install -e ".[gpu-flash2]""
#955 opened a year ago by phutoan31299
3
MoE with FSDP
#1197 opened 7 months ago by Muennighoff
1
Observing 1/2 the throughput on AMD MI250
#1153 opened 8 months ago by staghado
4
Possibility of training with hostname instead IP
#1180 opened 8 months ago by germanjke
1
Evaluation for long_context_tasks failed with a KeyError: 'continuation_indices'
#1073 opened 8 months ago by songkq
3
Is there a way to figure out what dependencies are installed in the docker image?
#1116 opened 8 months ago by sc-gr
1
Can you add the pre-training of dbrx?
#1074 opened 8 months ago by win10ogod
3
Fine-tune dbrx-instruct on a single VM with 8 H100s
#1105 opened 9 months ago by hosseinsarshar
1
Fly tokenization with multiple streams
#1030 opened 9 months ago by germanjke
12
Composer crashes when attempting to load sharded checkpoint
#998 opened 10 months ago by growlix
3
Wrong number of samples for C4?
#968 opened 10 months ago by eldarkurtic
2
`ValueError` when following finetuning `mpt-7b-arc-easy--gpu.yaml` example with different default batch size
#947 opened a year ago by ouhenio
2
How to run inference/convert_composer_to_hf.py with MPT-1B model on Habana Gaudi 2, file formats do not match
#917 opened a year ago by greg-serochi
4
Loss curve differences for pretraining
#910 opened a year ago by maxidl
1