DeepLink-org/deeplink.framework

kPrivateUse1请教

jianlonghaha opened this issue · 7 comments

请问kPrivateUse1注册Device是需要自己去注册吗?

deeplink 目前没有使用 PrivateUse1 backend. 因为deeplink 开发时, PrivateUse1 的能力还很弱, 我们 底层使用的 XPU 设备.

注册是这样注册吗?
auto options = at::TensorOptions().dtype(scalar_type).device(at::kXPU);
return at::from_blob(data, tensor_sizes, tensor_strides, options); // 从已有数据创建Tensor

这样注册报错
执行加法 result = a + b时候

Traceback (most recent call last):
File "", line 1, in
RuntimeError: XPU device type not enabled.
Exception raised from getDeviceFromPtr at /home/pytorch/aten/src/ATen/Context.h:62 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6c (0x7f5474d3928c in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xfa (0x7f5474cff10a in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: at::TensorMaker::make_tensor() + 0x8d1 (0x7f5476fa4e31 in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0xbe4f (0x7f53cca25e4f in /home/deeplink/deeplink.framework/dipu/third_party/DIOPI/impl/lib/libdiopi_impl.so)
frame #4: diopiAdd + 0x43 (0x7f53cca261d3 in /home/deeplink/deeplink.framework/dipu/third_party/DIOPI/impl/lib/libdiopi_impl.so)
frame #5: dipu::native::dipu_add_out(at::Tensor const&, at::Tensor const&, c10::Scalar const&, at::Tensor&) + 0x29c (0x7f53cf2744cc in /home/deeplink/deeplink.framework/dipu/torch_dipu/libtorch_dipu.so)
frame #6: dipu::native::dipu_add_tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2c1 (0x7f53cf275b41 in /home/deeplink/deeplink.framework/dipu/torch_dipu/libtorch_dipu.so)
frame #7: c10::impl::wrap_kernel_functor_unboxed<c10::impl::detail::WrapFunctionIntoFunctor<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &dipu::native::dipu_add_tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2b (0x7f53cf13223b in /home/deeplink/deeplink.framework/dipu/torch_dipu/libtorch_dipu.so)
frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x92 (0x7f5477340502 in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #9: + 0x41ec8f3 (0x7f5478f6c8f3 in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #10: + 0x41ecf0e (0x7f5478f6cf0e in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x17d (0x7f5477383d2d in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x472482 (0x7f548b76c482 in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #13: + 0x47263b (0x7f548b76c63b in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #14: python() [0x4ea0f1]
frame #15: python() [0x54fe5a]
frame #16: python() [0x56ad21]

frame #22: python() [0x5a5bd1]
frame #23: python() [0x5a4bdf]
frame #24: python() [0x4c0e24]
frame #27: python() [0x45000c]
frame #29: __libc_start_main + 0xf3 (0x7f548c991083 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #30: python() [0x579d3d]

uto options = at::TensorOptions().dtype(scalar_type).device(at::kXPU);
return at::from_blob(data, tensor_sizes, tensor_strides, options); // 从已有数据创建Tensor

之前是这样的,刚刚我
return at::from_blob(data, tensor_sizes, tensor_strides); // 从已有数据创建Tensor
这样就好了,哈哈!不去注册了就好了

不对,return at::from_blob(data, tensor_sizes, tensor_strides); 这样的话他是不是注册cpu了吧?

frame #3: + 0xbe4f (0x7f53cca25e4f in /home/deeplink/deeplink.framework/dipu/third_party/DIOPI/impl/lib/libdiopi_impl.so)
frame #4: diopiAdd + 0x43 (0x7f53cca261d3 in

这样注册报错 执行加法 result = a + b时候

Traceback (most recent call last): File "", line 1, in RuntimeError: XPU device type not enabled. Exception raised from getDeviceFromPtr at /home/pytorch/aten/src/ATen/Context.h:62 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6c (0x7f5474d3928c in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xfa (0x7f5474cff10a in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: at::TensorMaker::make_tensor() + 0x8d1 (0x7f5476fa4e31 in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame #3: + 0xbe4f (0x7f53cca25e4f in /home/deeplink/deeplink.framework/dipu/third_party/DIOPI/impl/lib/libdiopi_impl.so) frame #4: diopiAdd + 0x43 (0x7f53cca261d3 in /home/deeplink/deeplink.framework/dipu/third_party/DIOPI/impl/lib/libdiopi_impl.so) frame #5: dipu::native::dipu_add_out(at::Tensor const&, at::Tensor const&, c10::Scalar const&, at::Tensor&) + 0x29c (0x7f53cf2744cc in /home/deeplink/deeplink.framework/dipu/torch_dipu/libtorch_dipu.so) frame #6: dipu::native::dipu_add_tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2c1 (0x7f53cf275b41 in /home/deeplink/deeplink.framework/dipu/torch_dipu/libtorch_dipu.so) frame #7: c10::impl::wrap_kernel_functor_unboxed<c10::impl::detail::WrapFunctionIntoFunctor<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &dipu::native::dipu_add_tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2b (0x7f53cf13223b in /home/deeplink/deeplink.framework/dipu/torch_dipu/libtorch_dipu.so) frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x92 (0x7f5477340502 in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame #9: + 0x41ec8f3 (0x7f5478f6c8f3 in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame #10: + 0x41ecf0e (0x7f5478f6cf0e in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x17d (0x7f5477383d2d in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame #12: + 0x472482 (0x7f548b76c482 in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #13: + 0x47263b (0x7f548b76c63b in /opt/conda/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #14: python() [0x4ea0f1] frame #15: python() [0x54fe5a] frame #16: python() [0x56ad21]

frame #22: python() [0x5a5bd1] frame #23: python() [0x5a4bdf] frame #24: python() [0x4c0e24] frame #27: python() [0x45000c] frame #29: __libc_start_main + 0xf3 (0x7f548c991083 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #30: python() [0x579d3d]

应该就是这样用的, 为啥报错可能要 debug看下, 创建 cpu的 tensor肯定不行。 或者可以参考 DIOPI 里有一段抽取自 from_blob 的简化逻辑。 https://github.com/DeepLink-org/DIOPI.dev/blob/main/impl/torch/build_aten.cpp buildATenSafeImpl(), 用来从data构造新 tensor (当然 diopi torch 层因为直接用 aten, 所以新创建的 tensor 是 CUDA, 不是 XPU)。