abhijangda/fastkron

CMake Errors when doing "cmake .." in /fastkron/build

cwhsing opened this issue · 31 comments

Hi, when trying to install the library, in the process of "cmake .." in /fastkron/build,
I got two error messages:

CMake Error at tests/CMakeLists.txt:4 (ADD_SUBDIRECTORY):
  The source directory

    /home/minato/cwhsing/fastkron/tests/googletest

  does not contain a CMakeLists.txt file.


CMake Error at tests/benchmarks/CMakeLists.txt:1 (add_subdirectory):
  add_subdirectory given source "AnyOption" which is not an existing
  directory.

and failed to proceed.

Would you help me figure out what was going on wrong? Thanks!

Please clone using --recurse-submodules flag to git. Let me know if this has fixed the issue.

Before building, also build CUDA kernels as mentioned in README.md.

FastKron does not yet support calling CUDA kernels for any size of kronecker and input matrix. This work is in progress and will probably take a month or so.

Steps followed. One dumb question: Where do I find the build directory?
I always python setup.py install first to make /fastkron/build showing up, but it might be the wrong step since right now I've got tons of CMake Errors instead of two..

build directory is the mkdir build.

What errors are you getting? Can you paste them here.

Just built the directory using mkdir build and typed cmake ... Got the following errors:

CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "single-cuda-no-fusion-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "single-cuda-fusion-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "single-cuda-TT-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "single-cuda-tuner-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "single-cuda-non-square-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "single-cuda-non-square-TT-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "single-cuda-distinct-shapes".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "single-cuda-odd-shapes".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "multi-cuda-no-fusion-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "multi-cuda-tuner-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target
  "multi-cuda-no-fusion-non-square-tests".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt


CMake Error in tests/cuda/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "multi-cuda-distinct-shapes".


CMake Error in tests/cuda/CMakeLists.txt:
  The language CUDA was requested for compilation but was not enabled.  To
  enable a language it needs to be specified in a 'project' or
  'enable_language' command in the root CMakeLists.txt

Aah. Thanks for pointing this. Can you do git pull and redo the build steps.

Thanks I passed the cmake .. part. But when doing make -j, another two errors popped up:

[ 10%] Building CUDA object CMakeFiles/FastKron.dir/src/kernels/cuda/kron-kernels/cuda_256_8_8_8_512_2_3_0_1_2_float_4_4_N_N.cu.o
/home/minato/cwhsing/fastkron/src/handle/distrib_handle.cu:4:10: fatal error: nccl.h: No such file or directory
    4 | #include <nccl.h>
      |          ^~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/FastKron.dir/build.make:175: CMakeFiles/FastKron.dir/src/handle/distrib_handle.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 10%] Building CUDA object CMakeFiles/FastKron.dir/src/kernels/cuda/kron-kernels/cuda_64_8_8_4_1024_2_1_0_4_2_float_4_4_N_N.cu.o
/home/minato/cwhsing/fastkron/src/kernel_db/cuda_kernel_db.cu:4:10: fatal error: nccl.h: No such file or directory
    4 | #include <nccl.h>
      |          ^~~~~~~~
compilation terminated.

And the final message is:

make[1]: *** [CMakeFiles/Makefile2:270: CMakeFiles/FastKron.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

If you are building for CUDA then NCCL is required right now. Install NCCL from https://github.com/NVIDIA/nccl .

Successfully installed. Thanks a bunch for the guidance.
I've got a side question tho:
when using python (with torch), how do I use the module? Namely, what should I import and what's the callable function?

Great. I am still working on the project, so, it will be quite some time before this becomes very usable.

You can follow lines 182 in pyfastkron/fastkron.py . These lines import fastkron and work on torch tensors.

I failed to follow. Could you elaborate or give an example?
Say if I have two random tensors a & b, how do I calculate fastkron(a, b)?

Here is the example from pyfastkron/fastkron.py with comments

  import torch
  from fastkron import FastKronTorch 
 fastKron = FastKronTorch()
 
  #Allocate torch tensors
  M = 10
  N = 10
  Ps = [2] * N
  Qs = [2] * N
  
  x = torch.ones((M, reduce((lambda a, b: a * b), Ps)), dtype=torch.float32).cuda()
  y = torch.zeros((M, reduce((lambda a, b: a * b), Qs)), dtype=torch.float32).cuda()
  fs = [torch.ones((Ps[0], Qs[0]), dtype=torch.float32).cuda() for i in range(0, N)]

  #Allocate a temporary tensor
  rs, ts = fastKron.gekmmSizes(x, fs)
  t1 = torch.zeros(rs, dtype=torch.float32).cuda()

  #Tune to get fastest CUDA kernels
  fastKron.gekmmTune(x, fs, y)

  #Run Kron-Matmul
  fastKron.gekmm(x, fs, y, 1.0, 0.0, None, t1)
  print(y)

If I understand correctly, Ps and Qs are the input tensors?
If my inputs are a, b = torch.randn(4, 4), torch.randn(4, 4), what things should I change accordingly like M, fs or something else? Sorry it's still a bit hard to penetrate.

The operation being done here is:
y = x @ Kronecker Product of (fs[0], fs[1], ..., fs[N-1])

A Kronecker Factor, Fs[i], has size Ps[i] and Qs[i].
x has shape (M, Ps[0] * Ps[1] * Ps[2] * ... Ps[N-1])
y has shape (M, Qs[0] * Qs[1] * Qs[2] ... Qs[N-1])

In this code, each factor is of size 2 x 2 and there are N = 10. So, fs is of length 10 with each factor of shape 2 x 2.

The above code requires atleast 3 inputs, i.e., matrix x and atleast two factors (fs is of length 2).

In your case, if you have a, b, x = torch.randn(4,4), torch.randn(4,4), torch.randn(4,16), then above code will produce output y of size torch.randn(4, 16).

I hope this helps. I also recommend reading the terminology in the arxiv paper linked in README.

Thanks, I kinda get it. So basically M is an independent parameter to be tuned and has nothing to do with N, right?
BTW, I just ran python fastkron.py but got the following errors:

Traceback (most recent call last):
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 183, in <module>
    fastKron = FastKronTorch()
               ^^^^^^^^^^^^^^^
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 98, in __init__
    self.pyfastkron = PyFastKronWrapper()
                      ^^^^^^^^^^^^^^^^^^^
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 20, in __init__
    self.libKron = ctypes.CDLL("libFastKron.so")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/minato/anaconda3/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libFastKron.so: cannot open shared object file: No such file or directory
Exception ignored in: <function PyFastKronWrapper.__del__ at 0x7f349cf52a20>
Traceback (most recent call last):
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 88, in __del__
    self.destroyFn(self.cpp_handle)
    ^^^^^^^^^^^^^^
AttributeError: 'PyFastKronWrapper' object has no attribute 'destroyFn'

Kernels will be tuned for all parameters(M, N, Ps[0.. N-1], Qs[0 ... N -1]). I will recommend reading the background section of the paper, that will make everything clear.

To run you should set the environment variable LD_LIBRARY_PATH=<path-to-build-directory> python <your file.py>.

I added export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/minato/cwhsing/fastkron/build in ~/.bashrc but it didn't work. I'm not so sure how to add the python <your file.py> to the environment variable. Could you specify the correct way? Thanks.

You can run your python file using the above command I mentioned which specifies LD_LIBRARY_PATH. In this way you do not need to change bashrc.

Can you try with the command I mentioned? if that does not work then paste the error.

python <your file.py> refers to the command you are using to execute your python file. Replace <your file .py> with path to your python file that you are running.

I tried LD_LIBRARY_PATH=/home/minato/cwhsing/fastkron/build python /home/minato/cwhsing/fastkron/pyfastkron/fastkron.py but got the same error as above.

Can you do ls /home/minato/cwhsing/fastkron/build and make sure that libFastKron.so exists?

Yes it does exist in the build directory.

Can you try with

LD_LIBRARY_PATH=/home/minato/cwhsing/fastkron/build:"$LD_LIBRARY_PATH" python /home/minato/cwhsing/fastkron/pyfastkron/fastkron.py

What is your python version?

can you also paste the error? There might be subtle differences that might help debugging the error.

My python version is 3.11.7.
The command without "$LD_LIBRARY_PATH" gave the same error. But you're right, if I try with the above command, the error message is:

Traceback (most recent call last):
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 186, in <module>
    fastKron = FastKronTorch()
               ^^^^^^^^^^^^^^^
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 101, in __init__
    self.pyfastkron = PyFastKronWrapper()
                      ^^^^^^^^^^^^^^^^^^^
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 23, in __init__
    self.libKron = ctypes.CDLL("libFastKron.so")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/minato/anaconda3/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/minato/anaconda3/lib/python3.11/site-packages/torch/lib/../../../.././libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/minato/cwhsing/fastkron/build/libFastKron.so)
Exception ignored in: <function PyFastKronWrapper.__del__ at 0x7f8b64022a20>
Traceback (most recent call last):
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 91, in __del__
    self.destroyFn(self.cpp_handle)
    ^^^^^^^^^^^^^^
AttributeError: 'PyFastKronWrapper' object has no attribute 'destroyFn'

where the part

OSError: /home/minato/anaconda3/lib/python3.11/site-packages/torch/lib/../../../.././libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/minato/cwhsing/fastkron/build/libFastKron.so)

is the new one.

This is a known issue with Anaconda. See https://stackoverflow.com/questions/72540359/glibcxx-3-4-30-not-found-for-librosa-in-conda-virtual-environment-after-tryin

I solved this error by running command: conda install -c conda-forge libstdcxx-ng=12

The error was solved but another one emerges:

error
Segmentation fault (core dumped)

Here's my environmental information:

Python: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux
PyTorch: 2.2.1
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GPU 0: NVIDIA Tesla V100-SXM3-32GB
Driver Version: 550.54.14
CUDA Version: 12.4
NVCC: Cuda compilation tools, release 11.5, V11.5.119

Can you do git pull, redo the build process, and then try again? I made some changes few hours ago that should help with this problem.

I already have cuda-12.4 installed under /usr/local since the launch of the machine so that I did successfully install the whole thing (by adding CUDACXX=/usr/local/cuda-12.4/bin/nvcc to /etc/environment). But a few hours ago, to get the nvcc command I accidentally installed nvidia-cuda-toolkit, causing the the following error which was not existent before during the build process:

CMake Error at /usr/local/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find CUDA: Found unsuitable version "11.5", but required is at
  least "12.0" (found /usr)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:598 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/share/cmake-3.29/Modules/FindCUDA.cmake:1291 (find_package_handle_standard_args)
  CMakeLists.txt:33 (find_package)

I tried to reinstall the toolkit following NVIDIA's official guide: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network
but the NVCC version is still 11.5. Then I've searched the internet but found no useful info about how to update NVCC.

UPDATE:
I just uninstalled and reinstalled the whole CUDA toolkit, and now the NVCC version is 12.4 instead of 11.5. But the same error message remains.

It is possible that some parts of CUDA 11.5 are remaining. I recommend you fix that because mixing CUDA 11.5 and CUDA 12.4 libraries might give some errors.

I just downgraded the required CUDA version to 11.0 in FastKron. Do git pull and try again. If that does not work, then remove the build directory and then try.

Ahh I was just gonna post another update. I removed the whole local fastkron directory, git clone again, and then successfully built the library. And no parts of CUDA 11.5 remain as I last checked.

The command LD_LIBRARY_PATH=<path-to-build-directory> python <your file.py> does work. python fastkron.py prints quite a few lines, with the last few lines being:

Minimum Time 0.05 through kernels: 
  [0, 1] = 32768 64_8x8_1x512**2_0_2x4_NN runs for 0.03 ms
  [2, 4] = 32768 64_8x8_2x2048**3_0_4x8_NN runs for 0.03 ms
tensor([[32768., 32768., 32768.,  ..., 32768., 32768., 32768.],
        [32768., 32768., 32768.,  ..., 32768., 32768., 32768.],
        [32768., 32768., 32768.,  ..., 32768., 32768., 32768.],
        ...,
        [32768., 32768., 32768.,  ..., 32768., 32768., 32768.],
        [32768., 32768., 32768.,  ..., 32768., 32768., 32768.],
        [32768., 32768., 32768.,  ..., 32768., 32768., 32768.]],
       device='cuda:0')
Exception ignored in: <function PyFastKronWrapper.__del__ at 0x7fcd231faac0>
Traceback (most recent call last):
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 108, in __del__
AttributeError: 'NoneType' object has no attribute 'c_ulong'

Unfortunately, there's another error message. I'm not sure if it affects the calculation or not.

Good to know that.

I fixed this issue.

Just tried removing the whole directory, git clone and build as well as git pull and rebuild. The same error message:

Exception ignored in: <function PyFastKronWrapper.__del__ at 0x7fcd231faac0>
Traceback (most recent call last):
  File "/home/minato/cwhsing/fastkron/pyfastkron/fastkron.py", line 108, in __del__
AttributeError: 'NoneType' object has no attribute 'c_ulong'

persists for both approaches, however.

Interesting, I have removed that method for now. Will fix it later. Yes, this has no effect on the results.