Issues
- 1
LLaMA PRO training resume problem
#1231 opened by germanjke - 1
Conversion Sharded -> Monolithic checkpoint
#1220 opened by pretidav - 1
Finetuning does not work on nightly
#1221 opened by eldarkurtic - 0
Add State Space Models / Mamba Layer Support
#1174 opened by devin-ai-integration - 1
Train with attention mask
#1183 opened by germanjke - 3
- 1
MoE with FSDP
#1197 opened by Muennighoff - 1
Any plan for supporting DPO?
#846 opened by lorabit110 - 4
Observing 1/2 the throughput on AMD MI250
#1153 opened by staghado - 1
Possibility of training with hostname instead IP
#1180 opened by germanjke - 3
Evaluation for long_context_tasks failed with a KeyError: 'continuation_indices'
#1073 opened by songkq - 1
Is there a way to figure out what dependencies are installed in the docker image?
#1116 opened by sc-gr - 1
Opt-3b Pretrain YAML config failing with mosaicml/llm-foundry/2.2.1_cu121_flash2-4aef5de docker
#1141 opened by bhavnicksm - 3
Can you add the pre-training of dbrx?
#1074 opened by win10ogod - 1
- 2
Can't create environment on A100 server
#863 opened by eldarkurtic - 2
convert_dataset_hf.py example stuck
#906 opened by eldarkurtic - 10
How does packing work for non-MPT models?
#839 opened by lorabit110 - 12
Fly tokenization with multiple streams
#1030 opened by germanjke - 2
Installation issue from habana_alpha branch
#1090 opened by palash04 - 2
Setting Dropout in MPT Prefix-LM after Exporting to HuggingFace Crashes during Fine-tuning
#1046 opened by timsteuer - 3
- 2
Wrong number of samples for C4?
#968 opened by eldarkurtic - 2
Freeze when using cpu offload
#879 opened by gywlssww - 11
- 2
`ValueError` when following finetuning `mpt-7b-arc-easy--gpu.yaml` example with different default batch size
#947 opened by ouhenio - 4
How to run inference/convert_composer_to_hf.py with MPT-1B model on Habana Gaudi 2, file formats do not match
#917 opened by greg-serochi - 2
Is flops calculation correct?
#909 opened by lorabit110 - 2
FP8 not working
#903 opened by prigoyal - 1
Loss curve differences for pretraining
#910 opened by maxidl - 2
Issues in using FP8 for MPT baselines on H100
#885 opened by prigoyal - 1
Loss spikes and explode with mpt-1b model pretrain
#759 opened by sagorbrur - 2
sharded model format
#840 opened by j-Gaow - 2
- 1
Triton attention patch from Mistral
#860 opened by germanjke - 0
How to record loss curve?
#854 opened by YixinSong-e - 1
Error loading JSON fine-tuning datasets
#852 opened by lorabit110 - 9
How to support new ICL task types in own codebase
#848 opened by sanjari-orb - 2
buffer is too small for requested array
#753 opened by MLlove0402 - 1
Request to Support AWS Inferentia2 for More Cost-Effective and Faster Inference in MPT
#767 opened by anjiefang - 2
- 6
RuntimeError: Tensors must be CUDA and dense
#777 opened by karandua2016 - 7
- 1
- 1
flash attention 2 setup.py
#722 opened by germanjke - 0
`eval.py` hangs when config yaml's model hparams don't match model checkpoint hparams
#755 opened by growlix - 3
- 4
Bloom Tokenizer doen't work
#726 opened by robi56 - 2
Benchmarking GLUE tasks for in-context learning
#707 opened by ashim95 - 5