philschmid/deep-learning-pytorch-huggingface

Jupyter NotebookMIT

Issues

Fine-tune-llm-in-2024-with-trl.ipynb for LLAMA3.2
#65 opened 3 months ago by anas-zafar
0
ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float32
#61 opened 4 months ago by daje0601
0
OOM error in FSDP QLORA setup
#60 opened 4 months ago by ss8319
0
Regarding the OOM issues with fine-tuning Flan-T5-xl
#58 opened 6 months ago by Alsodream
0
Not able to run training/fsdp-qlora-distributed-llama3.ipynb
#55 opened 7 months ago by aasthavar
7
Quantization question:
#56 opened 7 months ago by aptum11
0
Deprecation warnings.
#52 opened 8 months ago by hohoCode
0
Fine-tune-llm-in-2024-with-trl.ipynb not producing the outputs
#51 opened 8 months ago by scigeek72
0
Out of Memory: Cannot reproduce T5-XXL run on 8xA10G.
#49 opened 10 months ago by slai-natanijel
3
What's the use of "messages" in dpo step?
#48 opened 10 months ago by katopz
0
question about DeepSpeedPeftCallback
#47 opened 10 months ago by mickeysun0104
0
Re. fine-tune-llms-in-2024-with-trl.ipynb
#45 opened 10 months ago by andysingal
1
Target modules all-linear not found in the base model.
#43 opened a year ago by kassemsabeh
6
Does deepspeed partition the model to multi GPUs?
#15 opened 2 years ago by vikki7777
4
Instruction tuning of LLama2 is significantly slower compared to documented 3 hours fine-tuning time on A10G.
#35 opened a year ago by mlscientist2
1
flash attention error on instruction tune llama-2 tutorial on Sagemaker notebook
#40 opened a year ago by matthewchung74
2
CUDA OOM error while saving the model
#16 opened 2 years ago by aasthavar
10
Precision Issue
#39 opened a year ago by zihaohe123
4
Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention?
#37 opened a year ago by ibicdev
11
Falcon-180B "forward() got an unexpected keyword argument 'position_ids'"
#38 opened a year ago by aittalam
0
Compute metrics while using SFT trainer
#34 opened a year ago by shubhamagarwal92
1
Cannot load tokenizer for llama2
#33 opened a year ago by smreddy05
1
LLama 2 Flash Attention Patch Not Working For 70B
#32 opened a year ago by mallorbc
6
question about the llama instruction code
#28 opened a year ago by yeontaek
8
Is the DataCollator necessary in peft-flan-t5-int8-summarization.ipynb ?
#29 opened a year ago by brooksbp
0
compute_metrics() function
#3 opened 2 years ago by ybagoury
1
How to create a json file for create_flan_t5_cnn_dataset.py
#25 opened a year ago by andysingal
1
Llama patch for FlashAttention support fails with use_cache
#26 opened a year ago by qmdnls
2
gcc/cuda used for training
#24 opened a year ago by danyaljj
1
CPU offload when not using offload deepspeed config file
#19 opened 2 years ago by siddharthvaria
3
Colab notebook fails
#17 opened 2 years ago by TzurV
1
Error when training peft model example
#18 opened 2 years ago by Tachyon5
6
FLAN-T5 XXL using DeepSpeed fits well for training but gives OOM error for inference.
#12 opened 2 years ago by irshadbhat
2
Inference on CNN validation set takes 2+ hours on p4dn.24xlarge machine with 8 A100s, 40GB each
#13 opened 2 years ago by sverneka
5
ValueError
#14 opened 2 years ago by Martok10
4
Sample inference script for FLAN-T5 XXL using DeepSpeed & Hugging Face.
#11 opened 2 years ago by irshadbhat
7
Error when finetuning Flan-T5-XXL on custom dataset
#10 opened 2 years ago by ngun7
1
Error (return code -7) when finetuning FLANT5-xxl on 8* A100
#7 opened 2 years ago by scofield7419
3
OOM when finetuning FLANT5-xxl
#6 opened 2 years ago by AndrewZhe
4
Chat Inference Code
#4 opened 2 years ago by samarthsarin
2
Problem with preprocess_function() in tutorial
#2 opened 2 years ago by ybagoury
4