complie controlnet error
dushwe opened this issue · 2 comments
in the same dokcer env
complie clip/unte/vae sucess! but compile controlnet error
env:
V100
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
GNU Make 4.1
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
package
aitemplate 0.3.dev0
alabaster 0.7.13
amqp 5.1.1
apeye 1.4.1
apeye-core 1.1.4
astroid 2.11.7
attrs 23.1.0
autodocsumm 0.2.11
Babel 2.13.0
backports.zoneinfo 0.2.1
beautifulsoup4 4.12.2
billiard 3.6.4.0
black 23.9.1
CacheControl 0.13.1
celery 5.1.2
certifi 2023.7.22
cffi 1.16.0
charset-normalizer 3.3.0
click 8.1.7
click-didyoumean 0.3.0
click-plugins 1.1.1
click-repl 0.3.0
cryptography 41.0.4
cssutils 2.7.1
cuda-python 11.7.0
Cython 3.0.3
Deprecated 1.2.14
dict2css 0.3.0
diffusers 0.21.4
dill 0.3.7
docutils 0.18.1
domdf-python-tools 3.6.1
einops 0.7.0
exceptiongroup 1.1.3
filelock 3.12.4
fsspec 2023.9.2
gitdb 4.0.10
GitPython 3.1.37
hflow 1.3.0
html5lib 1.1
huggingface-hub 0.17.3
idna 3.4
imagesize 1.4.1
importlib-metadata 4.13.0
iniconfig 2.0.0
isort 5.12.0
Jinja2 3.1.2
kombu 5.3.2
lazy-object-proxy 1.9.0
libcst 1.1.0
MarkupSafe 2.1.3
mccabe 0.7.0
moreorless 0.4.0
mpmath 1.3.0
msgpack 1.0.7
mypy-extensions 1.0.0
natsort 8.4.0
numpy 1.24.4
opencv-python 4.2.0.32
packaging 23.2
parameterized 0.9.0
pathspec 0.11.2
Pillow 10.0.1
pip 23.2.1
platformdirs 3.11.0
pluggy 1.3.0
prompt-toolkit 3.0.39
pycparser 2.21
PyGithub 2.1.1
Pygments 2.16.1
PyJWT 2.8.0
pylint 2.13.9
PyNaCl 1.5.0
pytest 7.4.2
python-dateutil 2.8.2
pytz 2023.3.post1
PyYAML 6.0.1
rabbitmq 0.2.0
redis 3.5.3
regex 2023.10.3
requests 2.31.0
retrying 1.3.3
ruamel.yaml 0.17.35
ruamel.yaml.clib 0.2.8
safetensors 0.4.0
setuptools 56.0.0
six 1.16.0
smmap 5.0.1
snowballstemmer 2.2.0
soupsieve 2.5
Sphinx 7.1.2
sphinx-autodoc-typehints 1.24.0
sphinx-gallery 0.14.0
sphinx-jinja2-compat 0.2.0
sphinx-prompt 1.7.0
sphinx-rtd-theme 1.3.0
sphinx-tabs 3.4.1
sphinx-toolbox 3.5.0
sphinxcontrib-applehelp 1.0.4
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.1
sphinxcontrib-inlinesyntaxhighlight 0.2
sphinxcontrib-jquery 4.1
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
stdlibs 2022.10.9
sympy 1.12
tabulate 0.9.0
timm 0.9.7
tokenizers 0.14.1
toml 0.10.2
tomli 2.0.1
tomlkit 0.12.1
torch 1.13.0+cu117
torchaudio 0.13.0+cu117
torchvision 0.14.0+cu117
tqdm 4.66.1
trailrunner 1.4.0
transformers 4.34.0
typing_extensions 4.8.0
typing-inspect 0.9.0
tzdata 2023.3
ufmt 2.2.0
urllib3 2.0.6
usort 1.0.7
vine 5.0.0
wcwidth 0.2.8
webencodings 0.5.1
wheel 0.38.4
wrapt 1.15.0
xmltodict 0.13.0
yacs 0.1.8
zipp 3.17.0
-- part error log----
82 errors detected in the compilation of "conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_64x64_32x3_nhwc_align_4_8.cu".
make: *** [conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_64x64_32x3_nhwc_align_4_8.obj] Error 255
/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_policy.h(58): error: incomplete type is not allowed
detected during:
instantiation of class "cutlass::gemm::warp::MmaTensorOpPolicy<Operator_, OpDelta_> [with Operator_=cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, OpDelta_=cutlass::MatrixShape<1, 1>]"
/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h(194): here
instantiation of class "cutlass::gemm::warp::MmaTensorOp<Shape_, ElementA_, LayoutA_, ElementB_, LayoutB_, ElementC_, LayoutC_, Policy_, PartitionsK_, AccumulatorsInRowMajor, Enable> [with Shape_=cutlass::gemm::GemmShape<64, 64, 32>, ElementA_=cutlass::half_t, LayoutA_=cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, ElementB_=cutlass::half_t, LayoutB_=cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, ElementC_=cutlass::half_t, LayoutC_=cutlass::layout::RowMajor, Policy_=cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, PartitionsK_=1, AccumulatorsInRowMajor=false, Enable=nv_bool]"
/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h(494): here
instantiation of class "cutlass::epilogue::threadblock::DefaultEpilogueTensorOp<Shape, WarpMmaTensorOp, PartitionsK, OutputOp_, ElementsPerAccess, ScatterD, PermuteDLayout> [with Shape_=cutlass::gemm::GemmShape<256, 128, 32>, WarpMmaTensorOp_=cutlass::gemm::warp::MmaTensorOp<cutlass::gemm::GemmShape<64, 64, 32>, cutlass::half_t, cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, 1, false, _nv_bool>, PartitionsK=1, OutputOp=cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ElementsPerAccess=8, ScatterD=false, PermuteDLayout=cutlass::layout::NoPermute]"
/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h(319): here
instantiation of class "cutlass::conv::kernel::DefaultConv2dFprop<ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, MathOperatorTag, cutlass::conv::IteratorAlgorithm::kFixedChannels, StrideSupport, AlignmentA, AlignmentB> [with ElementA=cutlass::half_t, LayoutA=cutlass::layout::TensorNHWC, ElementB=cutlass::half_t, LayoutB=cutlass::layout::TensorNHWC, ElementC=cutlass::half_t, LayoutC=cutlass::layout::TensorNHWC, ElementAccumulator=cutlass::half_t, ArchTag=cutlass::arch::Sm70, ThreadblockShape=cutlass::gemm::GemmShape<256, 128, 32>, WarpShape=cutlass::gemm::GemmShape<64, 64, 32>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, Stages=3, MathOperatorTag=cutlass::arch::OpMultiplyAdd, StrideSupport=cutlass::conv::StrideSupport::kStrided, AlignmentA=4, AlignmentB=4]"
conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_256x128_32x3_nhwc_align_4_8.cu(64): here
/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h(197): error: incomplete type is not allowed
detected during:
instantiation of class "cutlass::gemm::warp::MmaTensorOp<Shape_, ElementA_, LayoutA_, ElementB_, LayoutB_, ElementC_, LayoutC_, Policy_, PartitionsK_, AccumulatorsInRowMajor, Enable> [with Shape_=cutlass::gemm::GemmShape<64, 64, 32>, ElementA_=cutlass::half_t, LayoutA_=cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, ElementB_=cutlass::half_t, LayoutB_=cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, ElementC_=cutlass::half_t, LayoutC_=cutlass::layout::RowMajor, Policy_=cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, PartitionsK_=1, AccumulatorsInRowMajor=false, Enable=nv_bool]"
/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h(494): here
instantiation of class "cutlass::epilogue::threadblock::DefaultEpilogueTensorOp<Shape, WarpMmaTensorOp, PartitionsK, OutputOp_, ElementsPerAccess, ScatterD, PermuteDLayout> [with Shape_=cutlass::gemm::GemmShape<256, 128, 32>, WarpMmaTensorOp_=cutlass::gemm::warp::MmaTensorOp<cutlass::gemm::GemmShape<64, 64, 32>, cutlass::half_t, cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, 1, false, _nv_bool>, PartitionsK=1, OutputOp=cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ElementsPerAccess=8, ScatterD=false, PermuteDLayout=cutlass::layout::NoPermute]"
/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h(319): here
instantiation of class "cutlass::conv::kernel::DefaultConv2dFprop<ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, MathOperatorTag, cutlass::conv::IteratorAlgorithm::kFixedChannels, StrideSupport, AlignmentA, AlignmentB> [with ElementA=cutlass::half_t, LayoutA=cutlass::layout::TensorNHWC, ElementB=cutlass::half_t, LayoutB=cutlass::layout::TensorNHWC, ElementC=cutlass::half_t, LayoutC=cutlass::layout::TensorNHWC, ElementAccumulator=cutlass::half_t, ArchTag=cutlass::arch::Sm70, ThreadblockShape=cutlass::gemm::GemmShape<256, 128, 32>, WarpShape=cutlass::gemm::GemmShape<64, 64, 32>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, Stages=3, MathOperatorTag=cutlass::arch::OpMultiplyAdd, StrideSupport=cutlass::conv::StrideSupport::kStrided, AlignmentA=4, AlignmentB=4]"
conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_256x128_32x3_nhwc_align_4_8.cu(64): here
Traceback (most recent call last):
File "./scripts/compile_controlnet.py", line 86, in
compile_diffusers()
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "./scripts/compile_controlnet.py", line 73, in compile_diffusers
compile_controlnet(
File "/www/server/AITemplate/examples/05_stable_diffusion/src/compile_lib/compile_controlnet.py", line 117, in compile_controlnet
compile_model(
File "/usr/local/lib/python3.8/site-packages/aitemplate/utils/misc.py", line 93, in inner_function
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/aitemplate/compiler/compiler.py", line 276, in compile_model
compiler.transform.profile(
File "/usr/local/lib/python3.8/site-packages/aitemplate/compiler/transform/profile.py", line 92, in profile
compile_engine.make_profilers(generated_profilers, profiler_dir)
File "/usr/local/lib/python3.8/site-packages/aitemplate/backend/builder.py", line 835, in make_profilers
_run_make_cmds(
File "/usr/local/lib/python3.8/site-packages/aitemplate/backend/builder.py", line 183, in _run_make_cmds
raise RuntimeError("Build has failed.")
RuntimeError: Build has failed.
log file:
controlnet_log.txt
how to solve this problem ?
in the same dokcer env complie clip/unte/vae sucess! but compile controlnet error
env: V100 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) GNU Make 4.1 nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0
packageaitemplate 0.3.dev0 alabaster 0.7.13 amqp 5.1.1 apeye 1.4.1 apeye-core 1.1.4 astroid 2.11.7 attrs 23.1.0 autodocsumm 0.2.11 Babel 2.13.0 backports.zoneinfo 0.2.1 beautifulsoup4 4.12.2 billiard 3.6.4.0 black 23.9.1 CacheControl 0.13.1 celery 5.1.2 certifi 2023.7.22 cffi 1.16.0 charset-normalizer 3.3.0 click 8.1.7 click-didyoumean 0.3.0 click-plugins 1.1.1 click-repl 0.3.0 cryptography 41.0.4 cssutils 2.7.1 cuda-python 11.7.0 Cython 3.0.3 Deprecated 1.2.14 dict2css 0.3.0 diffusers 0.21.4 dill 0.3.7 docutils 0.18.1 domdf-python-tools 3.6.1 einops 0.7.0 exceptiongroup 1.1.3 filelock 3.12.4 fsspec 2023.9.2 gitdb 4.0.10 GitPython 3.1.37 hflow 1.3.0 html5lib 1.1 huggingface-hub 0.17.3 idna 3.4 imagesize 1.4.1 importlib-metadata 4.13.0 iniconfig 2.0.0 isort 5.12.0 Jinja2 3.1.2 kombu 5.3.2 lazy-object-proxy 1.9.0 libcst 1.1.0 MarkupSafe 2.1.3 mccabe 0.7.0 moreorless 0.4.0 mpmath 1.3.0 msgpack 1.0.7 mypy-extensions 1.0.0 natsort 8.4.0 numpy 1.24.4 opencv-python 4.2.0.32 packaging 23.2 parameterized 0.9.0 pathspec 0.11.2 Pillow 10.0.1 pip 23.2.1 platformdirs 3.11.0 pluggy 1.3.0 prompt-toolkit 3.0.39 pycparser 2.21 PyGithub 2.1.1 Pygments 2.16.1 PyJWT 2.8.0 pylint 2.13.9 PyNaCl 1.5.0 pytest 7.4.2 python-dateutil 2.8.2 pytz 2023.3.post1 PyYAML 6.0.1 rabbitmq 0.2.0 redis 3.5.3 regex 2023.10.3 requests 2.31.0 retrying 1.3.3 ruamel.yaml 0.17.35 ruamel.yaml.clib 0.2.8 safetensors 0.4.0 setuptools 56.0.0 six 1.16.0 smmap 5.0.1 snowballstemmer 2.2.0 soupsieve 2.5 Sphinx 7.1.2 sphinx-autodoc-typehints 1.24.0 sphinx-gallery 0.14.0 sphinx-jinja2-compat 0.2.0 sphinx-prompt 1.7.0 sphinx-rtd-theme 1.3.0 sphinx-tabs 3.4.1 sphinx-toolbox 3.5.0 sphinxcontrib-applehelp 1.0.4 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.1 sphinxcontrib-inlinesyntaxhighlight 0.2 sphinxcontrib-jquery 4.1 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 stdlibs 2022.10.9 sympy 1.12 tabulate 0.9.0 timm 0.9.7 tokenizers 0.14.1 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.1 torch 1.13.0+cu117 torchaudio 0.13.0+cu117 torchvision 0.14.0+cu117 tqdm 4.66.1 trailrunner 1.4.0 transformers 4.34.0 typing_extensions 4.8.0 typing-inspect 0.9.0 tzdata 2023.3 ufmt 2.2.0 urllib3 2.0.6 usort 1.0.7 vine 5.0.0 wcwidth 0.2.8 webencodings 0.5.1 wheel 0.38.4 wrapt 1.15.0 xmltodict 0.13.0 yacs 0.1.8 zipp 3.17.0
-- part error log----
82 errors detected in the compilation of "conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_64x64_32x3_nhwc_align_4_8.cu". make: *** [conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_64x64_32x3_nhwc_align_4_8.obj] Error 255 /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_policy.h(58): error: incomplete type is not allowed detected during: instantiation of class "cutlass::gemm::warp::MmaTensorOpPolicy<Operator_, OpDelta_> [with Operator_=cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, OpDelta_=cutlass::MatrixShape<1, 1>]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h(194): here instantiation of class "cutlass::gemm::warp::MmaTensorOp<Shape_, ElementA_, LayoutA_, ElementB_, LayoutB_, ElementC_, LayoutC_, Policy_, PartitionsK_, AccumulatorsInRowMajor, Enable> [with Shape_=cutlass::gemm::GemmShape<64, 64, 32>, ElementA_=cutlass::half_t, LayoutA_=cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, ElementB_=cutlass::half_t, LayoutB_=cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, ElementC_=cutlass::half_t, LayoutC_=cutlass::layout::RowMajor, Policy_=cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, PartitionsK_=1, AccumulatorsInRowMajor=false, Enable=nv_bool]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h(494): here instantiation of class "cutlass::epilogue::threadblock::DefaultEpilogueTensorOp<Shape, WarpMmaTensorOp, PartitionsK, OutputOp_, ElementsPerAccess, ScatterD, PermuteDLayout> [with Shape_=cutlass::gemm::GemmShape<256, 128, 32>, WarpMmaTensorOp_=cutlass::gemm::warp::MmaTensorOp<cutlass::gemm::GemmShape<64, 64, 32>, cutlass::half_t, cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, 1, false, _nv_bool>, PartitionsK=1, OutputOp=cutlass::epilogue:🧵:LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue:🧵:ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ElementsPerAccess=8, ScatterD=false, PermuteDLayout=cutlass::layout::NoPermute]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h(319): here instantiation of class "cutlass::conv::kernel::DefaultConv2dFprop<ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, MathOperatorTag, cutlass::conv::IteratorAlgorithm::kFixedChannels, StrideSupport, AlignmentA, AlignmentB> [with ElementA=cutlass::half_t, LayoutA=cutlass::layout::TensorNHWC, ElementB=cutlass::half_t, LayoutB=cutlass::layout::TensorNHWC, ElementC=cutlass::half_t, LayoutC=cutlass::layout::TensorNHWC, ElementAccumulator=cutlass::half_t, ArchTag=cutlass::arch::Sm70, ThreadblockShape=cutlass::gemm::GemmShape<256, 128, 32>, WarpShape=cutlass::gemm::GemmShape<64, 64, 32>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue:🧵:LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue:🧵:ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, Stages=3, MathOperatorTag=cutlass::arch::OpMultiplyAdd, StrideSupport=cutlass::conv::StrideSupport::kStrided, AlignmentA=4, AlignmentB=4]" conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_256x128_32x3_nhwc_align_4_8.cu(64): here
/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h(197): error: incomplete type is not allowed detected during: instantiation of class "cutlass::gemm::warp::MmaTensorOp<Shape_, ElementA_, LayoutA_, ElementB_, LayoutB_, ElementC_, LayoutC_, Policy_, PartitionsK_, AccumulatorsInRowMajor, Enable> [with Shape_=cutlass::gemm::GemmShape<64, 64, 32>, ElementA_=cutlass::half_t, LayoutA_=cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, ElementB_=cutlass::half_t, LayoutB_=cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, ElementC_=cutlass::half_t, LayoutC_=cutlass::layout::RowMajor, Policy_=cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, PartitionsK_=1, AccumulatorsInRowMajor=false, Enable=nv_bool]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h(494): here instantiation of class "cutlass::epilogue::threadblock::DefaultEpilogueTensorOp<Shape, WarpMmaTensorOp, PartitionsK, OutputOp_, ElementsPerAccess, ScatterD, PermuteDLayout> [with Shape_=cutlass::gemm::GemmShape<256, 128, 32>, WarpMmaTensorOp_=cutlass::gemm::warp::MmaTensorOp<cutlass::gemm::GemmShape<64, 64, 32>, cutlass::half_t, cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, 1, false, _nv_bool>, PartitionsK=1, OutputOp=cutlass::epilogue:🧵:LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue:🧵:ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ElementsPerAccess=8, ScatterD=false, PermuteDLayout=cutlass::layout::NoPermute]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h(319): here instantiation of class "cutlass::conv::kernel::DefaultConv2dFprop<ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, MathOperatorTag, cutlass::conv::IteratorAlgorithm::kFixedChannels, StrideSupport, AlignmentA, AlignmentB> [with ElementA=cutlass::half_t, LayoutA=cutlass::layout::TensorNHWC, ElementB=cutlass::half_t, LayoutB=cutlass::layout::TensorNHWC, ElementC=cutlass::half_t, LayoutC=cutlass::layout::TensorNHWC, ElementAccumulator=cutlass::half_t, ArchTag=cutlass::arch::Sm70, ThreadblockShape=cutlass::gemm::GemmShape<256, 128, 32>, WarpShape=cutlass::gemm::GemmShape<64, 64, 32>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue:🧵:LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue:🧵:ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, Stages=3, MathOperatorTag=cutlass::arch::OpMultiplyAdd, StrideSupport=cutlass::conv::StrideSupport::kStrided, AlignmentA=4, AlignmentB=4]" conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_256x128_32x3_nhwc_align_4_8.cu(64): here
Traceback (most recent call last): File "./scripts/compile_controlnet.py", line 86, in compile_diffusers() File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "./scripts/compile_controlnet.py", line 73, in compile_diffusers compile_controlnet( File "/www/server/AITemplate/examples/05_stable_diffusion/src/compile_lib/compile_controlnet.py", line 117, in compile_controlnet compile_model( File "/usr/local/lib/python3.8/site-packages/aitemplate/utils/misc.py", line 93, in inner_function return f(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/aitemplate/compiler/compiler.py", line 276, in compile_model compiler.transform.profile( File "/usr/local/lib/python3.8/site-packages/aitemplate/compiler/transform/profile.py", line 92, in profile compile_engine.make_profilers(generated_profilers, profiler_dir) File "/usr/local/lib/python3.8/site-packages/aitemplate/backend/builder.py", line 835, in make_profilers _run_make_cmds( File "/usr/local/lib/python3.8/site-packages/aitemplate/backend/builder.py", line 183, in _run_make_cmds raise RuntimeError("Build has failed.") RuntimeError: Build has failed.
how to solve this problem ?
A100, the problem did not appear! how to solve
Hi @dushwe Thank you for reporting the issue. We don't support Stable Diffusion on V100.