RuntimeError: invalid multinomial distribution

Question

RuntimeError: invalid multinomial distribution

TheFiZi opened this issue 2 years ago · 13 comments

Using the prompts: many pipes, many enemies, no blocks, low elevation

shape: torch.Size([1, 673]), torch.Size([1, 1304]) first: 56, last: 88:  93%|██████████████████████████████████████████████████████████████▎    | 1303/1400 [02:43<00:12,  7.97it/s]Traceback (most recent call last):
  File "/home/me/apps/mariogpt/capturePlay.py", line 38, in <module>
    generated_level = mario_lm.sample(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/lm/gpt.py", line 54, in sample
    return sampler(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 248, in __call__
    return self.sample(*args, **kwargs)
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 223, in sample
    next_tokens, encoder_hidden_states = self.step(
  File "/home/me/apps/mariogpt/lib/python3.10/site-packages/mario_gpt/sampler.py", line 172, in step
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)

Answer 1 · 2023-02-22T05:55:39.000Z

Oh interesting, what temperature value are you using?

Answer 2 · 2023-02-22T05:57:06.000Z

Oh interesting, what temperature value are you using?

Defaults

generated_level = mario_lm.sample(
    prompts=prompts,
    num_steps=1400,
    temperature=2.0,
    use_tqdm=True
)

Answer 3 · 2023-02-22T05:59:17.000Z

How frequently does this happen? I haven’t really seen this but it seems like some of the logit values are nans

Answer 4 · 2023-02-22T06:01:07.000Z

This is the first time in ~30-40 runs. Could just be something I'm doing wrong to be honest. I can let you know if I see it again. Is there more I can capture that would be helpful if it happens again?

Answer 5 · 2023-02-22T07:17:38.000Z

Not sure actually haha, never really encountered this, especially with temperature 2.0. Maybe a torch update is needed? What version are you using rn?

Answer 6 · 2023-02-22T19:06:14.000Z

I am using

Name: torch
Version: 1.13.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/sa_schewee/venv/thegoose/lib/python3.10/site-packages
Requires: nvidia-cublas-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, typing-extensions
Required-by: mario-gpt

There does not appear to be a newer version available:

(thegoose) me@nightshade:~$ pip3 install --upgrade torch
Requirement already satisfied: torch in ./venv/thegoose/lib/python3.10/site-packages (1.13.1)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in ./venv/thegoose/lib/python3.10/site-packages (from torch) (8.5.0.96)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in ./venv/thegoose/lib/python3.10/site-packages (from torch) (11.7.99)
Requirement already satisfied: typing-extensions in ./venv/thegoose/lib/python3.10/site-packages (from torch) (4.5.0)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in ./venv/thegoose/lib/python3.10/site-packages (from torch) (11.7.99)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in ./venv/thegoose/lib/python3.10/site-packages (from torch) (11.10.3.66)
Requirement already satisfied: setuptools in ./venv/thegoose/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (59.6.0)
Requirement already satisfied: wheel in ./venv/thegoose/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (0.38.4)

Answer 7 · 2023-02-22T19:35:20.000Z

Yeah I feel like this and #12 might be related somehow, could be some weird cuda issue. I’ll look into reproducing this, but I hope this doesn’t happen too frequently for you

Answer 8 · 2023-02-22T19:59:29.000Z

Sadly, 3 times in a row today. I haven't had a successful run yet today.

Answer 9 · 2023-02-22T20:02:11.000Z

And does it only happen to you on gpu? Or is it both cpu and gpu?

Answer 10 · 2023-02-22T20:03:28.000Z

Let me switch it to CPU and run some more tests.

Answer 11 · 2023-02-22T22:57:49.000Z

No issues with CPUs so far and have generated multiple images successfully. I wonder if it's a RAM issue. My GPU only has 2GB and my VM has 4GB.

I have access to a 3080. Will do some tests with that and see if I can replicate the problem.

Answer 12 · 2023-02-22T23:29:14.000Z

I’m testing with my laptop right now which has a Quadro (4gb), and it seems to be running fine. Quite strange haha

Answer 13 · 2023-02-24T02:58:52.000Z

I am going to close this off as a not enough memory issue. I ran the default generation example and it peaked at ~6GB of vRAM.

The Quadro I was running it on only has 2GB.