SJTU-IPADS/fgnn-artifacts

is the dgl_install.sh right?

Closed this issue · 4 comments

in my process of install dgl by dgl_install.sh, there are many errors.
and how does the CONDA_PREFIX works? i don't have this environment variable after i install anaconda

just like this:

-- Start configuring project dgl
-- Build with CUDA support
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.1
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda-10.1/lib64/libcudart.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/lib/x86_64-linux-gnu/libcublas.so
-- Found CUDA_CURAND_LIBRARY=/usr/local/cuda-10.1/lib64/libcurand.so
-- Detected CUDA of version 10.1. Use external CUB/Thrust library.
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Build with OpenMP.
-- Build with LIBXSMM optimization.
-- -fopenmp -O2 -Wall -fPIC -std=c++11  -DUSE_AVX -DUSE_LIBXSMM -DDGL_CPU_LLC_SIZE=40000000 -DIDXTYPEWIDTH=64 -DREALTYPEWIDTH=32
-- Running GPU architecture autodetection
-- Found CUDA arch 7.5
-- CUDA flags: -Xcompiler ,-fopenmp,-O2,-Wall,-fPIC,,,-DUSE_AVX,-DUSE_LIBXSMM,-DDGL_CPU_LLC_SIZE=40000000,-DIDXTYPEWIDTH=64,-DREALTYPEWIDTH=32;--expt-relaxed-constexpr;-gencode;arch=compute_75,code=sm_75;--expt-extended-lambda;-Wno-deprecated-declarations;-std=c++14
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- /home/chenhy/fgnn-artifacts/3rdparty/dgl/third_party/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Configuring done
-- Generating done
-- Build files have been written to: /home/chenhy/fgnn-artifacts/3rdparty/dgl/build
~/fgnn-artifacts/3rdparty/dgl/build ~/fgnn-artifacts/3rdparty/dgl ~
Consolidate compiler generated dependencies of target dmlc
Consolidate compiler generated dependencies of target metis
[  4%] Built target dmlc
[ 30%] Built target metis
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using Python interpreter: python
Traceback (most recent call last):
  File "/home/chenhy/fgnn-artifacts/3rdparty/dgl/tensoradapter/pytorch/find_cmake.py", line 1, in <module>
    import torch
ImportError: No module named torch
-- find_cmake.py output: 
CMake Error at CMakeLists.txt:16 (list):
  list GET given empty list


CMake Error at CMakeLists.txt:17 (list):
  list GET given empty list


-- Configuring for PyTorch 
-- Setting directory to /Torch
CMake Error at CMakeLists.txt:22 (find_package):
  By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Torch", but
  CMake did not find one.

  Could not find a package configuration file provided by "Torch" with any of
  the following names:

    TorchConfig.cmake
    torch-config.cmake

  Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
  "Torch_DIR" to a directory containing one of the above files.  If "Torch"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!
See also "/home/chenhy/fgnn-artifacts/3rdparty/dgl/tensoradapter/pytorch/build/CMakeFiles/CMakeOutput.log".
CMakeFiles/tensoradapter_pytorch.dir/build.make:70: recipe for target 'CMakeFiles/tensoradapter_pytorch' failed
make[2]: *** [CMakeFiles/tensoradapter_pytorch] Error 1
CMakeFiles/Makefile2:175: recipe for target 'CMakeFiles/tensoradapter_pytorch.dir/all' failed
make[1]: *** [CMakeFiles/tensoradapter_pytorch.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
================================================================================
LIBXSMM master-1.16.1-1534 (Linux@chenhy-MS-7C22)
--------------------------------------------------------------------------------
GNU Compiler Collection: gcc 7.5.0, and g++ 7.5.0
C / C++ target: -msse4.2
Fortran Compiler is disabled or missing: no Fortran interface is built!
--------------------------------------------------------------------------------
--- LIBXSMM build log
/usr/bin/ar: creating lib/libxsmmnoblas.a
/usr/bin/ar: creating lib/libxsmmgen.a
/usr/bin/ar: creating lib/libxsmm.a
/usr/bin/ar: creating lib/libxsmmext.a
================================================================================
LIBXSMM master-1.16.1-1534 (Linux@chenhy-MS-7C22)
--------------------------------------------------------------------------------
GNU Compiler Collection: gcc 7.5.0, and g++ 7.5.0
C / C++ target: -msse4.2
Fortran Compiler is disabled or missing: no Fortran interface is built!
--------------------------------------------------------------------------------
BLAS dependency and fallback is removed!
--------------------------------------------------------------------------------
[ 30%] Built target libxsmm
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2
~/fgnn-artifacts/3rdparty/dgl ~
~/fgnn-artifacts/3rdparty/dgl/python ~/fgnn-artifacts/3rdparty/dgl ~
Traceback (most recent call last):
  File "setup.py", line 7, in <module>
    from setuptools import find_packages
ImportError: No module named setuptools
~/fgnn-artifacts/3rdparty/dgl ~
~

or maybe a mistake happens earlier in dgl.patch?
there are many "patch failed"

It seems that you skipped several dependencys, like pytorch and setuptools. Please refer to the Installation section in readme.

Most of our dependency is installed in conda environments. If you have a conda environment properly activated, there should be an environment variable named CONDA_PREFIX, and a hint should appear in the command line prompt.

As for the errors happened during patching dgl, the log you provide seems irrelevent to the failed patch. Please provide the full workflow and outputs of your installation.

It is also recommanded to move to this new repo. This repo is a snapshot version for AE in eurosys'22, while future maintenance will be adopted to the new repo. It is welcomed to open a new issue in that repo, or continue our discussion in this thread if you like.

I finally find my problem. there are something missing when I was installing the CUDNN.
and I excute the ./fgnn-artifacts/3rdparty/dgl_install.sh in the root mode.
the errors happened in the patching also won't affect this install.
and it is running well now.

thanks for your answering.
this will be a great, groundbreaking article and project. just like deepspeed in DNN.