寒武纪mlu 如何对PaddleCustomDevice的mlu进行源码编译?
Opened this issue · 6 comments
由于python版本要求使用3.8版本,不能直接使用安装python3.10版本的wheel包
paddle_custom_mlu.whl
可以给出paddlecustomdevice源码编译的步骤和命令么?谢谢!
@YanhuiDua
@YanhuiDua 我按照步骤,用python3.8进行源码编译 PaddleCustomDevice release/2.6版本,过程中遇到一些错误,
遇到的错误摘录如下:
Submodule path 'Paddle': checked out '90138318312fbb60b0bdce8b0f4fb317879fe62e'
-- PADDLE_SOURCE_DIR=/home/wzy/PaddleCustomDevice/Paddle
-- Paddle version is 0.0.0
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
...
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Looking for C++ include inttypes.h - found
-- Looking for C++ include sys/types.h
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
...
-- Generating done
-- Build files have been written to: /home/wzy/PaddleCustomDevice/backends/mlu/build/third_party/mkldnn/src/extern_mkldnn-build
[ 15%] Performing build step for 'extern_mkldnn'
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
...
[ 24%] Building CXX object src/cpu/x64/CMakeFiles/dnnl_cpu_x64.dir/cpu_barrier.cpp.o
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
...
[ 25%] Building CXX object src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/passes/lower.cpp.o
-- Looking for snprintf - found
-- Looking for get_static_proc_name in unwind
-- Looking for get_static_proc_name in unwind - not found
-- Looking for UnDecorateSymbolName in dbghelp
-- Looking for UnDecorateSymbolName in dbghelp - not found
-- Performing Test HAVE___ATTRIBUTE__
-- Performing Test HAVE___ATTRIBUTE__ - Success
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_DEFAULT
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_DEFAULT - Success
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_HIDDEN
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_HIDDEN - Success
-- Performing Test HAVE___BUILTIN_EXPECT
-- Performing Test HAVE___BUILTIN_EXPECT - Success
-- Performing Test HAVE___SYNC_VAL_COMPARE_AND_SWAP
-- Performing Test HAVE___SYNC_VAL_COMPARE_AND_SWAP - Success
-- Performing Test HAVE_RWLOCK
-- Performing Test HAVE_RWLOCK - Failed
-- Performing Test HAVE___DECLSPEC
-- Performing Test HAVE___DECLSPEC - Failed
-- Performing Test STL_NO_NAMESPACE
-- Performing Test STL_NO_NAMESPACE - Failed
但是也能正常编译出wheel包。
安装完wheel包后 ,
wzy@gxnzx119:~/PaddleCustomDevice/backends/mlu$ python3 -m pip install build/dist/paddle_custom_mlu-0.0.0-cp38-cp38-linux_x86_64.whl
Defaulting to user installation because normal site-packages is not writeable
Processing ./build/dist/paddle_custom_mlu-0.0.0-cp38-cp38-linux_x86_64.whl
paddle-custom-mlu is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
WARNING: Error parsing dependencies of distro-info: Invalid version: '0.23ubuntu1'
WARNING: Error parsing dependencies of python-debian: Invalid version: '0.1.36ubuntu1'
在执行之前同样验证过的程序时,出现Segmentation fault。
打印栈帧,如下:
Segmentation fault (core dumped)
wzy@gxnzx119:~/paddle_tests/models$ lldb python3
(lldb) target create "python3"
Current executable set to 'python3' (x86_64).
(lldb) run benchmark_ano.py
Process 3134839 launched: '/usr/bin/python3' (x86_64)
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
warning: (x86_64) /home/wzy/.local/lib/python3.8/site-packages/numpy.libs/libgfortran-040039e1.so.5.0.0 No LZMA support found for reading .gnu_debugdata section
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
warning: (x86_64) /home/wzy/.local/lib/python3.8/site-packages/pillow.libs/libXau-00ec42fe.so.6.0.0 No LZMA support found for reading .gnu_debugdata section
I0703 02:51:22.082170 3134839 init.cc:234] ENV [CUSTOM_DEVICE_ROOT]=/home/wzy/.local/lib/python3.8/site-packages/paddle_custom_device
I0703 02:51:22.082192 3134839 init.cc:143] Try loading custom device libs from: [/home/wzy/.local/lib/python3.8/site-packages/paddle_custom_device]
Process 3134839 stopped
* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0xf00000001)
frame #0: 0x0000000f00000001
error: memory read failed for 0xf00000000
(lldb) bt
* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0xf00000001)
* frame #0: 0x0000000f00000001
frame #1: 0x00007fffe3554c6e libphi.so`phi::CustomKernelMap::RegisterCustomKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, phi::KernelKey const&, phi::Kernel const&) + 622
frame #2: 0x00007fffaebdc83e libpaddle-custom-mlu.so`phi::KernelRegistrar::ConstructKernel(phi::RegType, char const*, char const*, common::DataLayout, phi::DataType, void (*)(phi::KernelKey const&, phi::KernelArgsDef*), void (*)(phi::KernelKey const&, phi::Kernel*), std::function<void (phi::KernelContext*)>, void*) (.constprop.371) + 2222
frame #3: 0x00007fffaebdcdfe libpaddle-custom-mlu.so`phi::KernelRegistrar::KernelRegistrar(phi::RegType, char const*, char const*, common::DataLayout, phi::DataType, void (*)(phi::KernelKey const&, phi::KernelArgsDef*), void (*)(phi::KernelKey const&, phi::Kernel*), std::function<void (phi::KernelContext*)>, void*) + 158
frame #4: 0x00007fffaeb3430e libpaddle-custom-mlu.so`__static_initialization_and_destruction_0(int, int) (.constprop.355) + 4062
请问如何解决呢?
看上去是第三方依赖哭pthread的问题,建议使用官方提供的镜像:docker pull registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310,在这个镜像里安装py38的环境进行编译
也可以参考这个dockerfile自己产出py38的镜像:
paddle-mlu的dockerfile : https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/mlu/tools/dockerfile/Dockerfile.mlu.kylinv10.gcc82.py310
paddle-cpu的dockerfile: https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/custom_cpu/tools/dockerfile/Dockerfile.ubuntu20.x86_64.gcc84
重新尝试在registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310镜像里安装了py38的环境进行编译,发现与主机端编译时报错一样。是否是由于paddlecustomdevice版本问题导致的编译不通过呢?如果是paddlecustomdevice版本的问题,请问正常执行的paddlecustomdevice版本是哪个?
@YanhuiDua
重新尝试在registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310镜像里安装了py38的环境进行编译,发现与主机端编译时报错一样。是否是由于paddlecustomdevice版本问题导致的编译不通过呢?如果是paddlecustomdevice版本的问题,请问正常执行的paddlecustomdevice版本是哪个? @YanhuiDua
根据这个报错,你编译的包应该是可以的,需要通过 --force-reinstall 命令重新安装下
paddle-custom-mlu is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
WARNING: Error parsing dependencies of distro-info: Invalid version: '0.23ubuntu1'
WARNING: Error parsing dependencies of python-debian: Invalid version: '0.1.36ubuntu1'
当我使用下面的dockerfile构建py3.8版本的容器时
paddle-cpu的dockerfile: https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/custom_cpu/tools/dockerfile/Dockerfile.ubuntu20.x86_64.gcc84
构建到这部分构建命令时,
# install Paddle requirement
RUN wget --no-check-certificate https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/python/requirements.txt -O requirements.txt && \
pip install -r requirements.txt -i https://pip.baidu-int.com/simple --trusted-host pip.baidu-int.com && rm -rf requirements.txt
RUN wget --no-check-certificate https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/python/unittest_py/requirements.txt -O requirements.txt && \
pip install -r requirements.txt -i https://pip.baidu-int.com/simple --trusted-host pip.baidu-int.com && rm -rf requirements.txt
出现错误:
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe36442f4f0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe36442f7f0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe36442f9a0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe36442fb50>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe364502ee0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
ERROR: Could not find a version that satisfies the requirement httpx (from versions: none)
ERROR: No matching distribution found for httpx
ping pip.baidu-int.com 显示 Name or service not known,请问如何解决?
@qili93 @YanhuiDua