facebookincubator/AITemplate

Cannot Build fx2ait with setup.py

ioeddk opened this issue · 3 comments

When I was trying to build fx2ait with setup.py, it gives the following error:

-- Added CUDA NVCC flags for: -gencode;arch=compute_86,code=sm_86
CMake Warning at /usr/local/lib/python3.8/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /usr/local/lib/python3.8/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:4 (find_package)


-- Found Torch: /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch.so  
-- Configuring done (3.7s)
-- Generating done (0.0s)
-- Build files have been written to: /home/AITemplate/fx2ait/build/temp.linux-x86_64-3.8
---------- Building extensions ----------------------------------------
[ 33%] Building CXX object CMakeFiles/ait_model.dir/fx2ait/csrc/AITModel.cpp.o
[ 66%] Building CXX object CMakeFiles/ait_model.dir/fx2ait/csrc/AITModelImpl.cpp.o
/home/AITemplate/fx2ait/fx2ait/csrc/AITModelImpl.cpp: In member function ‘void torch::aitemplate::AITModelImpl::allocateOutputs(std::vector<c10::intrusive_ptr<c10::StorageImpl> >&, std::vector<AITData>&, std::vector<std::vector<long int> >&, std::vector<long int*>&, const c10::Device&)’:
/home/AITemplate/fx2ait/fx2ait/csrc/AITModelImpl.cpp:328:44: error: ‘struct c10::StorageImpl’ has no member named ‘mutable_data’
  328 |     ait_outputs.emplace_back(storage_impl->mutable_data(), shape, ait_dtype);
      |                                            ^~~~~~~~~~~~
/home/AITemplate/fx2ait/fx2ait/csrc/AITModelImpl.cpp: In function ‘c10::ScalarType torch::aitemplate::{anonymous}::AITemplateDtypeToTorchDtype(AITemplateDtype)’:
/home/AITemplate/fx2ait/fx2ait/csrc/AITModelImpl.cpp:261:1: warning: control reaches end of non-void function [-Wreturn-type]
  261 | }
      | ^
make[2]: *** [CMakeFiles/ait_model.dir/build.make:90: CMakeFiles/ait_model.dir/fx2ait/csrc/AITModelImpl.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/ait_model.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
Traceback (most recent call last):
  File "setup.py", line 101, in <module>
    setup(
  File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 144, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 67, in run
    self.do_egg_install()
  File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 109, in do_egg_install
    self.run_command('bdist_egg')
  File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/lib/python3/dist-packages/setuptools/command/bdist_egg.py", line 172, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/usr/lib/python3/dist-packages/setuptools/command/bdist_egg.py", line 158, in call_command
    self.run_command(cmdname)
  File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/lib/python3/dist-packages/setuptools/command/install_lib.py", line 23, in run
    self.build()
  File "/usr/lib/python3.8/distutils/command/install_lib.py", line 109, in build
    self.run_command('build_ext')
  File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "setup.py", line 82, in run
    subprocess.check_call(cmake_cmd, cwd=self.build_temp)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j2']' returned non-zero exit status 2.

This is on a platform of SM86, CUDA 11.6. It's not working either bare metal or in the docker image. I've also tried on a SM75 platform with CUDA12.0, it gives the same error as CMake. Also, it gives the error either with setup.py install or setup.py bdist_wheel.

cc fx2ait poc @wushirong @frank-wei to take a look.

Also encountered this, running on an A10g instance.

Updating pytorch version should solve this problem. StorageImpl->mutabe_data() was recently added pytorch/pytorch#97647.