ROCm/Tensile

TypeError: sequence item 0: expected str instance, NoneType found for compilerArgs

waheedi opened this issue · 8 comments

Currently trying to build Tensile on 24.04 using latest develop branches.

I had to ignore the asmCapCache Verification to get into the compilation step, but then i hit another error, where the cxxComplier is ending up equaling to None, because it goes over several attempts to be set.

printing our args as we have a None Type error at the moment [None, '-D__HIP_HCC_COMPAT_MODE__=1', '--cuda-device-only', '-x', 'hip', '-O3', '-I', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile', '-Xoffload-linker', '--build-id', '--offload-arch=gfx1010', '--offload-arch=gfx803', '--offload-arch=gfx1030', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile/TensileLibrary_Type_I8I_HPA_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback.cpp', '-c', '-o', '/home/bargo/projects/rocm-setup/rocBLAS/build/library/src/build_tmp/TENSILE/code_object_tmp/TensileLibrary_Type_I8I_HPA_Contraction_l_Ailk_Bjlk_Cijk_Dijk_fallback.o'] launcher: [] hipFlags: ['-D__HIP_HCC_COMPAT_MODE__=1', '--cuda-device-only', '-x', 'hip', '-O3', '-I', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile', '-Xoffload-linker', '--build-id']  archFlags: ['--offload-arch=gfx1010', '--offload-arch=gfx803', '--offload-arch=gfx1030'] 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
    r = call_item()
        ^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/joblib/parallel.py", line 589, in __call__
    return [func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bargo/projects/rocm-setup/rocBLAS/build/virtualenv/lib/python3.12/site-packages/Tensile/Parallel.py", line 52, in pcallWithGlobalParamsMultiArg
    return f(*args)
           ^^^^^^^^
  File "/home/bargo/projects/rocm-setup/rocBLAS/build/virtualenv/lib/python3.12/site-packages/Tensile/TensileCreateLibrary.py", line 284, in buildSourceCodeObjectFile
    tPrint(2, "hipcc:" + " ".join(compileArgs))
                         ^^^^^^^^^^^^^^^^^^^^^
TypeError: sequence item 0: expected str instance, NoneType found
"""


Currently trying to resolve it, I hope I found the main culprit.

Looking at this, i can confirm that the which(cxxCompiler) is returning None:

printing our args as we have a None Type error at the moment [None, '-D__HIP_HCC_COMPAT_MODE__=1', '--cuda-device-only', '-x', 'hip', '-O3', '-I', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile', '-Xoffload-linker', '--build-id', '--offload-arch=gfx1010', '--offload-arch=gfx803', '--offload-arch=gfx1030', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile/TensileLibrary_Type_ZZ_Contraction_l_Ailk_BjlkC_Cijk_Dijk_fallback.cpp', '-c', '-o', '/home/bargo/projects/rocm-setup/rocBLAS/build/library/src/build_tmp/TENSILE/code_object_tmp/TensileLibrary_Type_ZZ_Contraction_l_Ailk_BjlkC_Cijk_Dijk_fallback.o'] launcher: [] hipFlags: ['-D__HIP_HCC_COMPAT_MODE__=1', '--cuda-device-only', '-x', 'hip', '-O3', '-I', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile', '-Xoffload-linker', '--build-id'] which(CxxCompiler) : None archFlags: ['--offload-arch=gfx1010', '--offload-arch=gfx803', '--offload-arch=gfx1030']

so that happens regardless of the value of my env CMAKE_CXX_COMPILER or -DCMAKE_CXX_COMPILER values. both are not considered also CXX is ignored. What a nice logic man

ok for now i just symlinked my hipcc to amdclang++ to get myself going
here is how it looks after that change:

printing our args as we have a None Type error at the moment ['/opt/rocm/bin/amdclang++', '-D__HIP_HCC_COMPAT_MODE__=1', '--cuda-device-only', '-x', 'hip', '-O3', '-I', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile', '-Xoffload-linker', '--build-id', '--offload-arch=gfx1010', '--offload-arch=gfx803', '--offload-arch=gfx1030', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile/TensileLibrary_Type_CC_Contraction_l_AlikC_Bjlk_Cijk_Dijk_fallback.cpp', '-c', '-o', '/home/bargo/projects/rocm-setup/rocBLAS/build/library/src/build_tmp/TENSILE/code_object_tmp/TensileLibrary_Type_CC_Contraction_l_AlikC_Bjlk_Cijk_Dijk_fallback.o'] launcher: [] hipFlags: ['-D__HIP_HCC_COMPAT_MODE__=1', '--cuda-device-only', '-x', 'hip', '-O3', '-I', '/home/bargo/projects/rocm-setup/rocBLAS/build/Tensile', '-Xoffload-linker', '--build-id'] which(CxxCompiler) : /opt/rocm/bin/amdclang++ archFlags: ['--offload-arch=gfx1010', '--offload-arch=gfx803', '--offload-arch=gfx1030']

What are you trying to do? Compile for gfx1010, or tuning for gfx1010?

well, im trying to compile a modified gfx1010 not really tuned, trying to replicate some builtins of gfx1030 to gfx1010

What is your current ROCm version and where did you get your ROCm packages from?

I'm asking this because assuming your LLVM/Clang are not from the amd-staging branch from ROCm/llvm-project and you're using the develop branch of Tensile and rocBLAS, even if you fix the current issue you're having, you're almost definitely going to encounter compiler errors down the road because the develop branch in Tensile (and rocBLAS) usually contains new instructions or operations that are in amd-staging but are not yet released. Therefore, if you're system has, say ROCm 6.0, then you should make modifications on the release/rocm-rel-6.0 branch for both Tensile and rocBLAS, to ensure you're starting with something that's already guaranteed to be working.

of course im using amd-staging man :)

bargo@beta:~/projects/rocm-setup/llvm-project$ cat .git/config 
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote "origin"]
        url = https://github.com/ROCm/llvm-project.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "amd-staging"]
        remote = origin
        merge = refs/heads/amd-staging

What is your current ROCm version and where did you get your ROCm packages from?

I built all from source

Ah cool. So usually when I encountered these problems in the past, it's usually some environment variables that are missing. You may want to read through the code to find out what environment variables Tensile is searching, and feel free to reference the environment variables we use at Solus: https://github.com/getsolus/packages/blob/e09737c01f53b02bc51d027ff428d44ce37ac083/packages/r/rocblas/package.yml#L29. Note that /usr/lib64/llvm-rocm is our prefix directory for the ROCm LLVM, i.e. the DESTDIR when installing, so adjust accordingly to your install.

I think the amdclang++ is already linked to clang++ in the packages for distro releases, so I dont think anyone would hit this issue, but maybe its a good improvement, I will also close it.