Issues
- 2
Finetuning dataloader does not respect shuffle seed
#1676 opened by janEbert - 0
Failing to build in Docker
#1678 opened by swood - 1
ERROR:composer.cli.launcher:Global rank 0 (PID 208865) exited with code -11
#1501 opened by AndrewHYC - 3
Setting Dropout in MPT Prefix-LM after Exporting to HuggingFace Crashes during Fine-tuning
#1046 opened by timsteuer - 2
Not able to install Transformer Engine for fp8
#1526 opened by palash04 - 1
How to evaluate model using multi-gpu?
#1541 opened by lqniunjunlper - 2
Bug in convert_dataset_hf
#1575 opened by eitanturok - 1
- 4
- 1
MI300X Compatibility
#1568 opened by nikhil-tensorwave - 7
- 1
Triviaqa metrics wrong!
#1557 opened by lqniunjunlper - 1
When Finetuning Llama3, Error occurs
#1508 opened by AndrewHYC - 1
Converting checkpoints to HF post surgery Algos
#1492 opened by Extirpater - 2
How to continue pretrain LLM fp8 with hf_causal_lm
#1261 opened by YixinSong-e - 1
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
#1484 opened by eldarkurtic - 2
Training cannot start
#1469 opened by yuezih - 2
Fill in the middle
#1264 opened by germanjke - 2
Update StreamingTextDataset to support truncation with multiple truncated items out.
#1363 opened by LingxiaoShawn - 1
profiler issue
#1448 opened by germanjke - 1
- 2
Opt-3b Pretrain YAML config failing with mosaicml/llm-foundry/2.2.1_cu121_flash2-4aef5de docker
#1141 opened by bhavnicksm - 2
Installation issue from habana_alpha branch
#1090 opened by palash04 - 7
- 6
LLaMA PRO training resume problem
#1231 opened by germanjke - 1
Train with attention mask
#1183 opened by germanjke - 4
MPT training with ALiBi and Flash Attention 2
#1289 opened by rickgit16 - 3
Composer lora weights conversion
#1325 opened by zhao-lun - 1
- 8
Allow multiprocessing when preparing ICL dataset
#1276 opened by sanjari-orb - 1
Why is there a warmup in hf_generate.py?
#1271 opened by palash04 - 1
Managing Timeout on Training Errors and Simultaneous Restart of All Nodes in LLM Foundry
#1272 opened by germanjke - 1
could you give an elaborated steps about how to run llm-foundry on AMD mi250 devices
#1242 opened by Alice1069 - 2
Finetuning does not work on nightly
#1221 opened by eldarkurtic - 1
Conversion Sharded -> Monolithic checkpoint
#1220 opened by pretidav - 0
Add State Space Models / Mamba Layer Support
#1174 opened by devin-ai-integration - 3
- 1
MoE with FSDP
#1197 opened by Muennighoff - 4
Observing 1/2 the throughput on AMD MI250
#1153 opened by staghado - 1
Possibility of training with hostname instead IP
#1180 opened by germanjke - 3
Evaluation for long_context_tasks failed with a KeyError: 'continuation_indices'
#1073 opened by songkq - 1
Is there a way to figure out what dependencies are installed in the docker image?
#1116 opened by sc-gr - 3
Can you add the pre-training of dbrx?
#1074 opened by win10ogod - 1
- 12
Fly tokenization with multiple streams
#1030 opened by germanjke - 3
- 2
Wrong number of samples for C4?
#968 opened by eldarkurtic - 2
`ValueError` when following finetuning `mpt-7b-arc-easy--gpu.yaml` example with different default batch size
#947 opened by ouhenio - 4
How to run inference/convert_composer_to_hf.py with MPT-1B model on Habana Gaudi 2, file formats do not match
#917 opened by greg-serochi - 1
Loss curve differences for pretraining
#910 opened by maxidl