[Bug] Compatibility issue with torch 2.2.0

Question

[Bug] Compatibility issue with torch 2.2.0

tongbaojia opened this issue 6 months ago · 4 comments

Hi. Thanks for making & maintaining this amazing repo first.

We ran into this issue just two days ago, there are some compatibility issues with torch 2.2.0 onwards.

conda create -n test_env python=3.10
conda activate test_env
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE pip install flash-attn==2.5.2 --no-build-isolation

Then open python and import flash_attn will return an error, (torch version is 2.2.0) like:

ImportError: .../test_env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

We don't observe this issue with flash_attn 2.5.2 and torch 2.1.2, but def exists for us with flash_attn 2.5.2 and torch 2.2.0 (and nightly). Our current walk around is to use torch 2.1 but would definitely love to resolve this for future compatibility.

Answer 1 · 2024-02-10T02:57:55.000Z

Have you tried to compile manually?

pip uninstall flash-attn
git clone flash-attn + python setup.py install

We are running flash attn 2.5.2 under torch 2.2.0 with no issue but we try to avoid use pip for flash-attn for various reasons.

Answer 2 · 2024-02-10T03:25:10.000Z

Thanks! Seems to work for me! Will test a bit more.

Still, curious what are the known various reasons ? Is it bad practice to pip install flash-attn in general?

Answer 3 · 2024-02-10T04:39:19.000Z

@tongbaojia For us, we just find any pkg that dynamically compiles custom c/c++ extensions such as flash-attn is just very unreliable for pip to manage properly. It feels like if a base pkg such as torch is upgraded, it doesn't understand the dependent pkgs is tied to a specific torch version and everything needs to be recompiled. Same thing happens with us with deepspeed where recently we just had to manually clear the local c/c++ extension cache dir used by deepspeed so that it can recompile the kernels due to torch upgrade.

Answer 4 · 2024-02-10T23:43:48.000Z

Thanks again. This starts to sound like a pip issue to me. Hopefully this will get resolved in the future.

I will close this for now as there seems to be no immediate action terms from flash-attn side.