uncomplicate/deep-diamond

mnist ttes fails with CUDNN_STATUS_NOT_SUPPORTED

behrica opened this issue · 12 comments

I have the mnist-classification-test failing:

OS: arch-linux
CUDA: cuda_11.6.1_510.47.03_linux
CUDNN: cudnn-linux-x86_64-8.4.0.27_cuda11.6

The tests in uncomplicate.diamond.functional.mnist.mnist-classification-test faill all.
with: CUDNN_STATUS_NOT_SUPPORTED

All tests in test/uncomplicate/diamond/internal/cudnn/

Any idea what that could be ?

What is weired, is that all tensors seem to contaon "0" only...

(frequencies train-labels)
->> {0 60000}

What is your hardware?

Can you please paste the output of nvidia-smi?

  • did you install the cudnn package in addition to cuda (through pacman)?

The issue was different spelling in filenames of input files.
The map-tensor does not fail on it, but reads all 0....

These files are full of images of white numbers on black background. Most of the numbers are zeroes (but not all of course). But, if you're only looking at the first 100 numbers or so, these are 0.

No, I head a "wrong spelling" in the file names. All labels were 0 as well.
Somehow the map-tensor ignore "file not found" errors

This does not fail, but produces a tensor of all 0:

(def train-images-file (random-access "asdòlsadaòd"))
(def train-images (map-tensor train-images-file [60000 1 28 28] :uint8 :nchw :read 16))

"asdòlsadaòd" has to be the actual file containing MNIST images dataset (in binary matrix form explained on the mnist datasite, and in the DLFP book).

yes, The original issue came because I had a misspelled filename.
So the filename I gave did not exist on disk.

But to my surprise, map-tensor does not fail when giving it an non existing file.
It returns tensors of all Zero, which at the end let to [CUDNN_STATUS_NOT_SUPPORTED]

So we can keep close this here, but i would suggest to make map-tensor fail on no exsitiong files.
This could help to avoid future confusions.

I would say it works as intended. Here's why:

  1. map-tensor does not deal with file management per se. It is up to the caller to provide a valid file. An you did provide a valid file. How so, if you mistyped the name?
  2. train-imges-file is a RandomAccessFile (standard Java 7) If you check your code you'll see that the file object exists after random-access returns. How so?
  3. As explained in the standard Java docs, https://docs.oracle.com/javase/7/docs/api/java/io/RandomAccessFile.html#RandomAccessFile(java.lang.String,%20java.lang.String), the constructor is going to create a file with the provided name if one doesn't exist. Only if it can't create a new file with that name, the exception is going to be thrown: "FileNotFoundException - if the mode is "r" but the given string does not denote an existing regular file, or if the mode begins with "rw" but the given string does not denote an existing, writable regular file and a new regular file of that name cannot be created, or if some other error occurs while opening or creating the file".
  4. You don't have to use the random-access function to grab the file with your data. It is important that you provide a file that can be mapped to. Use any Java/Clojure method that does that in a way that satisfies your constraints and requirements.

Finally I understand the confusions.

This line of the example:

(def train-images-file (random-access "data/mnist/train-images-idx3-ubyte"))
opens the file in mode "rw",(and not in 'r') as I would have expected.

This is why the Java IO functions do not fail, even when giving a non existing file name.

So probbaly there is nothing wrong anywhere, (except that I sould open data files in mode "r") to avoid issues in case of wrong file names

(def train-images-file (random-access "data/mnist/train-images-idx3-ubyte" "r"))

Theoretically, yes. The mode should be :read in clojure. However, try it out first, because I think that even that mode is not going to throw any exception.