[BUG] CUDA-10 library doesn't support the Turing-based RTX 2060?

Question

[BUG] CUDA-10 library doesn't support the Turing-based RTX 2060?

Closed this issue 4 years ago · 8 comments

Looks like the Turing-based RTX 2060 is too new for the cuda-10 library. It's apparently "sm_75" but the cuda-10 library only supports up to sm_72.

Trying to run the fwaccel-gpu example from Simon Marlow's book, which I patched to start with

{-# LANGUAGE CPP, BangPatterns #-}
{-# OPTIONS_GHC -Wall -fno-warn-name-shadowing #-}

module Main ( main, test {-, maxDistances -} ) where

import Prelude
import System.Environment
import Data.Array.Accelerate as A
import Data.Array.Accelerate.LLVM.PTX

(I removed the import of AccelerateCompat)

Runtime error:

$ stack run fwaccel-gpu -- 200
fwaccel-gpu:
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/AccelerateHS/accelerate/issues

ptxas - -o /home/sedwards/.cache/accelerate/accelerate-llvm-1.3.0.0/accelerate-llvm-ptx-1.3.0.0/llvm-hs-9.0.1/nvptx64-nvidia-cuda/sm75/rel/morp9202ae4c7dc569aa0ea6fb2c02b0efca3bfd980b6e440b3de34a094604d51ee0.sass -arch=sm_75 (exit 255)
ptxas fatal : Value 'sm_75' is not defined for option 'gpu-name'

CallStack (from HasCallStack):
internalError: Data.Array.Accelerate.LLVM.PTX.Compile:185:24
compileCUBIN: Data.Array.Accelerate.LLVM.PTX.Compile:123:20
compile: Data.Array.Accelerate.LLVM.PTX.Compile:88:22

If the bug is with the accelerate-llvm-ptx GPU backend, include the output of nvidia-device-query

resolver: lts-16.25

extra-deps:

accelerate-1.3.0.0
accelerate-llvm-ptx-1.3.0.0
accelerate-llvm-1.3.0.0
cuda-0.10.2.0
nvvm-0.10.0.0

resolver: lts-16.25

Ubuntu 18.04.5 LTS

GeForce RTX 2060

cuda-10

Answer 1 · 2020-12-11T09:42:30.000Z

I have a 2080ti so this definitely should work! What version of llvm are you using? I have llvm-9.

Answer 2 · 2020-12-11T10:06:07.000Z

Oh, I think you need cuda-10.1 installed (that's what I have) (or later, but I haven't updated the bindings for cuda-11* yet, sorry...). It seems the error is coming from ptxas, which is part of the CUDA toolchain.

Answer 3 · 2020-12-11T10:12:39.000Z

I think I have cuda-10 installed, although I first started by installing cuda-11, which gave link errors with the Haskell libraries. And I'm also using LLVM 9 (standard under Ubuntu 18.04)

How does the system choose to send sm_75 to ptxas? I couldn't quickly find it in the code.

Answer 4 · 2020-12-11T11:44:08.000Z

In the file Target.hs. Your system is very similar to mine so it should work.

CUDA-10 indeed added support for sm_75: https://docs.nvidia.com/cuda/archive/10.0/cuda-toolkit-release-notes/index.html#cuda-general-new-features

Oh, if you upgraded your CUDA Toolkit after installing my CUDA bindings package, you might need to unregister (stack exec ghc-pkg -- unregister --force cuda) that package and reinstall it.

Answer 5 · 2020-12-11T19:56:20.000Z

I think it's the nvidia-cuda-toolkit package, which is pegged to Ubuntu 18.04, that has an out-of-date ptxas executable that doesn't support sm_75. I'm unable to install the package for Ubuntu 20.04. Which Ubuntu release are you running?

…

On Fri, Dec 11, 2020 at 6:44 AM Trevor L. McDonell ***@***.***> wrote: In the file Target.hs <https://github.com/AccelerateHS/accelerate-llvm/blob/master/accelerate-llvm-ptx/src/Data/Array/Accelerate/LLVM/PTX/Target.hs#L168>. Your system is very similar to mine so it should work. CUDA-10 indeed added support for sm_75: https://docs.nvidia.com/cuda/archive/10.0/cuda-toolkit-release-notes/index.html#cuda-general-new-features Oh, if you upgraded your CUDA Toolkit after installing my CUDA bindings package, you might need to unregister (stack exec ghc-pkg -- unregister --force cuda) that package and reinstall it. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#482 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGJXIRHLHHQO4VOTOD6O4LSUIAZNANCNFSM4UUYXAKA> .

Answer 6 · 2020-12-26T16:54:44.000Z

(sorry for the late response). I'm running ubuntu 20.04 and CUDA 10.1.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal

Answer 7 · 2020-12-27T17:40:27.000Z

Thanks. I'm convinced my problem was that I was running Ubuntu 18.04. Consider this the problem solved; I just have to upgrade my OS.

…

On Sat, Dec 26, 2020 at 11:54 AM Trevor L. McDonell < ***@***.***> wrote: (sorry for the late response). I'm running ubuntu 20.04 and CUDA 10.1. $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.1 LTS Release: 20.04 Codename: focal — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#482 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGJXIXVNLSPNDAYHHOLNK3SWYIN7ANCNFSM4UUYXAKA> .

Answer 8 · 2021-01-08T14:32:46.000Z

okay, closing for now then. let us know if you want any help and I'll do my best.