DeepLink-org/deeplink.framework

Ascend + pytorch 2.1.1编译dipu失败

Jipengfei134 opened this issue · 2 comments

在Ascend 910a 机器上,基于pytorch 2.1.1 编译dipu有报错,pytorch 2.0.0 则没有问题:

[79/561] Building CXX object ascend_npu/CMakeFiles/diopi_impl.dir/torch_npu/csrc/DIOPIAdapter.cpp.o
FAILED: ascend_npu/CMakeFiles/diopi_impl.dir/torch_npu/csrc/DIOPIAdapter.cpp.o
/usr/bin/c++ -DBUILD_LIBTORCH -DTEST_USE_ADAPTOR -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Ddiopi_impl_EXPORTS -I/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/../adaptor/csrc -isystem /cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/../proto/include -isystem /cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/third_party/acl/inc -isystem /home/ma-user/anaconda3/envs/PyTorch-2.0.1/include/python3.9 -isystem /home/ma-user/anaconda3/envs/PyTorch-2.0.1/lib/python3.9/site-packages/torch/include -isystem /home/ma-user/anaconda3/envs/PyTorch-2.0.1/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem _deps/op_plugin-src -isystem /cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu -isystem /cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu -isystem /cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/../third_party/half/include -Wall -Wno-sign-compare -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-return-type -Wno-unused-function -Wno-unused-but-set-variable -Wno-unused-variable -g -O0 -O2 -DNDEBUG -fPIC -std=c++1z -MD -MT ascend_npu/CMakeFiles/diopi_impl.dir/torch_npu/csrc/DIOPIAdapter.cpp.o -MF ascend_npu/CMakeFiles/diopi_impl.dir/torch_npu/csrc/DIOPIAdapter.cpp.o.d -o ascend_npu/CMakeFiles/diopi_impl.dir/torch_npu/csrc/DIOPIAdapter.cpp.o -c /cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:59:5: error: static assertion failed: at::ScalarType::Undefined and ACL_DT_UNDEFINED is not match any more, please check AT_ALL_SCALAR_TYPE_AND_ACL_DATATYPE_PAIR and modify it
static_assert(kATenScalarTypeToAclDataTypeTable[static_cast<int64_t>(at_dtype)] == (acl_dtype),
^
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:59:5: note: in definition of macro ‘ENUM_PAIR_FUNC’
static_assert(kATenScalarTypeToAclDataTypeTable[static_cast<int64_t>(at_dtype)] == (acl_dtype),
^~~~~~~~~~~~~
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:63:1: note: in expansion of macro ‘AT_ALL_SCALAR_TYPE_AND_ACL_DATATYPE_PAIR’
AT_ALL_SCALAR_TYPE_AND_ACL_DATATYPE_PAIR(ENUM_PAIR_FUNC)
^
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:59:5: error: static assertion failed: at::ScalarType::NumOptions and ACL_DT_UNDEFINED is not match any more, please check AT_ALL_SCALAR_TYPE_AND_ACL_DATATYPE_PAIR and modify it
static_assert(kATenScalarTypeToAclDataTypeTable[static_cast<int64_t>(at_dtype)] == (acl_dtype),
^
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:59:5: note: in definition of macro ‘ENUM_PAIR_FUNC’
static_assert(kATenScalarTypeToAclDataTypeTable[static_cast<int64_t>(at_dtype)] == (acl_dtype),
^~~~~~~~~~~~~
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:63:1: note: in expansion of macro ‘AT_ALL_SCALAR_TYPE_AND_ACL_DATATYPE_PAIR’
AT_ALL_SCALAR_TYPE_AND_ACL_DATATYPE_PAIR(ENUM_PAIR_FUNC)
^
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp: In function ‘void at_npu::native::assert_no_partial_overlap(const at::Tensor&, const at::Tensor&)’:
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:996:58: error: static_cast from type ‘const void*’ to type ‘char*’ casts away qualifiers
const auto a_begin = static_cast<char*>(a->data());
^
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:998:58: error: static_cast from type ‘const void*’ to type ‘char*’ casts away qualifiers
const auto b_begin = static_cast<char*>(b->data());
^
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp: In static member function ‘static aclError at_npu::native::CalcuOpUtil::AclrtMemcpyWithModeSwitch(const StorageAndOffsetMemSizePair&, size_t, const StorageAndOffsetMemSizePair&, size_t, aclrtMemcpyKind)’:
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:1035:79: error: static_cast from type ‘const void*’ to type ‘uint8_t* {aka unsigned char*}’ casts away qualifiers
void* dst_ptr = static_cast<void*>(static_cast<uint8_t*>(dst.first->data()) + dst.second);
^
/cx/lqy/code/deeplink.framework/dipu/third_party/DIOPI/impl/ascend_npu/torch_npu/csrc/DIOPIAdapter.cpp:1036:79: error: static_cast from type ‘const void*’ to type ‘uint8_t* {aka unsigned char*}’ casts away qualifiers
void* src_ptr = static_cast<void*>(static_cast<uint8_t*>(src.first->data()) + src.second);

是不支持编译吗

我们已经支持在pytorch2.1.1上编译了。可能是你没有更新diopi模块到最新的分支上
cd deeplink.framework/dipu/
git submodule update
git submodule sync
然后重新编译试一下

diopi至少需要更新到DeepLink-org/DIOPI#1085 以后