Dao-AILab/flash-attention

window11 python3.10 cu117, can't intall on those version

TaucherLoong opened this issue · 5 comments

E:.py_users\aiplus\lib\site-packages\torch\utils\cpp_extension.py:359: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'flash_attn_2_cuda' extension
Emitting ninja build file E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\build.ninja...
Compiling objects...
Using envvar MAX_JOBS (2) as the number of workers...
[1/48] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IE:\workshop\llama_tuner\flash-attention\csrc\flash_attn -IE:\workshop\llama_tuner\flash-attention\csrc\flash_attn\src -IE:\workshop\llama_tuner\flash-attention\csrc\cutlass\include -IE:.py_users\aiplus\lib\site-packages\torch\include -IE:.py_users\aiplus\lib\site-packages\torch\include\torch\csrc\api\include -IE:.py_users\aiplus\lib\site-packages\torch\include\TH -IE:.py_users\aiplus\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include" -IE:.py_users\aiplus\include -Id:\pythonsdk\include -Id:\pythonsdk\Include -ID:\Program_Files\VC\Tools\MSVC\14.33.31629\include -ID:\Program_Files\VC\Auxiliary\VS\include "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c E:\workshop\llama_tuner\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: E:/workshop/llama_tuner/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IE:\workshop\llama_tuner\flash-attention\csrc\flash_attn -IE:\workshop\llama_tuner\flash-attention\csrc\flash_attn\src -IE:\workshop\llama_tuner\flash-attention\csrc\cutlass\include -IE:.py_users\aiplus\lib\site-packages\torch\include -IE:.py_users\aiplus\lib\site-packages\torch\include\torch\csrc\api\include -IE:.py_users\aiplus\lib\site-packages\torch\include\TH -IE:.py_users\aiplus\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include" -IE:.py_users\aiplus\include -Id:\pythonsdk\include -Id:\pythonsdk\Include -ID:\Program_Files\VC\Tools\MSVC\14.33.31629\include -ID:\Program_Files\VC\Auxiliary\VS\include "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c E:\workshop\llama_tuner\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
flash_bwd_hdim128_bf16_sm80.cu
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_OPERATORS__”(用“/U__CUDA_NO_HALF_OPERATORS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_CONVERSIONS__”(用“/U__CUDA_NO_HALF_CONVERSIONS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF2_OPERATORS__”(用“/U__CUDA_NO_HALF2_OPERATORS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_BFLOAT16_CONVERSIONS__”(用“/U__CUDA_NO_BFLOAT16_CONVERSIONS__”)
flash_bwd_hdim128_bf16_sm80.cu
E:/.py_users/aiplus/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: 预处理器指令后有意外标记 - 应输入换行符
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_OPERATORS__”(用“/U__CUDA_NO_HALF_OPERATORS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF_CONVERSIONS__”(用“/U__CUDA_NO_HALF_CONVERSIONS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_HALF2_OPERATORS__”(用“/U__CUDA_NO_HALF2_OPERATORS__”)
cl: 命令行 warning D9025 :正在重写“/D__CUDA_NO_BFLOAT16_CONVERSIONS__”(用“/U__CUDA_NO_BFLOAT16_CONVERSIONS__”)
flash_bwd_hdim128_bf16_sm80.cu
E:/.py_users/aiplus/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: 预处理器指令后有意外标记 - 应输入换行符
E:/.py_users/aiplus/lib/site-packages/torch/include\c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]"
(61): here

the above errors showed up after typing python setup.py install

the last sentence 'RuntimeError: Error compiling objects for extension'

oh, the torch is 2.0.1

放弃吧,Github上有人成功编译出的whl包,cuda最低版本是12.1,没见过有比这个版本更低的包了-.-

I am also 117, do you have a solution