could you give an elaborated steps about how to run llm-foundry on AMD mi250 devices

Question

could you give an elaborated steps about how to run llm-foundry on AMD mi250 devices

Alice1069 opened this issue 7 months ago · 1 comments

Alice1069 commented 7 months ago

could run the llm-foundry on AMD 4xMi250 machine

Steps to reproduce the behavior:

follow latest instructions from: https://github.com/ROCm/flash-attention/tree/flash_attention_for_rocm
start from docker image: rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1
export GPU_ARCHS="gfx90a"
export PYTHON_SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
patch "${PYTHON_SITE_PACKAGES}/torch/utils/hipify/hipify_python.py" hipify_patch.patch
pip install .
verified by PYTHONPATH=$PWD python benchmarks/benchmark_flash_attention.py
"pip list" shows "flash-attn 2.0.4"
get llm-foundry v0.7 code
modify setup.py

'torch>=2.2.1,<2.3',

'torch>=2.0,<2.0.2',

pip3 install --upgrade pip
pip install -e .
command to run :
python data_prep/convert_dataset_hf.py
--dataset c4 --data_subset en
--out_root my-copy-c4 --splits train_small val_small
--concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text '<|endoftext|>'

composer train/train.py train/yamls/pretrain/mpt-1b.yaml data_local=my-copy-c4 train_loader.dataset.split=train_small eval_loader.dataset.split=val_small max_duration=10ba eval_interval=0 loss_fn=torch_crossentropy save_folder=mpt-1b

it said lack of lotary_emb
pip install lotary_emb
re run command, it said lack of libcudart.11.0
export LD_LIBRARY_PATH to include libudart
re run command , it said lack of libtorch_cuda.so

could you give me a detailed version of hwo to run llm-foundry on AMD mi250, i read through the 2 blogs about AMD, but not get the hint. any version of code is ok. Thank you!

Answer 1 · 2024-06-06T21:09:32.000Z

Hi @Alice1069, the FlashAttention ROCM version is likely fairly old now, so the easiest thing to do would be to disable FlashAttention. The other thing to try would be to manually comment out all the rotary emb codepaths and not use ROPE.