While running the clip_guided notebook in CPU mode I get: "RuntimeError - Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead"
illtellyoulater opened this issue · 8 comments
When I run clip_guided notebook in CPU mode, I get the following error at the "Sample from the base model" cell:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9272/4093479580.py in <module>
20 # Sample from the base model.
21 model.del_cache()
---> 22 samples = diffusion.p_sample_loop(
23 model,
24 (batch_size, 3, options["image_size"], options["image_size"]),
c:\users\alf\downloads\glide-text2im\glide_text2im\gaussian_diffusion.py in p_sample_loop(self, model, shape, noise, clip_denoised, denoised_fn, cond_fn, model_kwargs, device, progress)
387 """
388 final = None
--> 389 for sample in self.p_sample_loop_progressive(
390 model,
391 shape,
c:\users\alf\downloads\glide-text2im\glide_text2im\gaussian_diffusion.py in p_sample_loop_progressive(self, model, shape, noise, clip_denoised, denoised_fn, cond_fn, model_kwargs, device, progress)
439 t = th.tensor([i] * shape[0], device=device)
440 with th.no_grad():
--> 441 out = self.p_sample(
442 model,
443 img,
c:\users\alf\downloads\glide-text2im\glide_text2im\gaussian_diffusion.py in p_sample(self, model, x, t, clip_denoised, denoised_fn, cond_fn, model_kwargs)
351 ) # no noise when t == 0
352 if cond_fn is not None:
--> 353 out["mean"] = self.condition_mean(cond_fn, out, x, t, model_kwargs=model_kwargs)
354 sample = out["mean"] + nonzero_mask * th.exp(0.5 * out["log_variance"]) * noise
355 return {"sample": sample, "pred_xstart": out["pred_xstart"]}
c:\users\alf\downloads\glide-text2im\glide_text2im\respace.py in condition_mean(self, cond_fn, *args, **kwargs)
95
96 def condition_mean(self, cond_fn, *args, **kwargs):
---> 97 return super().condition_mean(self._wrap_model(cond_fn), *args, **kwargs)
98
99 def condition_score(self, cond_fn, *args, **kwargs):
c:\users\alf\downloads\glide-text2im\glide_text2im\gaussian_diffusion.py in condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs)
287 This uses the conditioning strategy from Sohl-Dickstein et al. (2015).
288 """
--> 289 gradient = cond_fn(x, t, **model_kwargs)
290 new_mean = p_mean_var["mean"].float() + p_mean_var["variance"] * gradient.float()
291 return new_mean
c:\users\alf\downloads\glide-text2im\glide_text2im\respace.py in __call__(self, x, ts, **kwargs)
122 new_ts_2 = map_tensor[ts.ceil().long()]
123 new_ts = th.lerp(new_ts_1, new_ts_2, frac)
--> 124 return self.model(x, new_ts, **kwargs)
c:\users\alf\downloads\glide-text2im\glide_text2im\clip\model_creation.py in cond_fn(x, t, grad_scale, **kwargs)
57 with torch.enable_grad():
58 x_var = x.detach().requires_grad_(True)
---> 59 z_i = self.image_embeddings(x_var, t)
60 loss = torch.exp(self.logit_scale) * (z_t * z_i).sum()
61 grad = torch.autograd.grad(loss, x_var)[0].detach()
c:\users\alf\downloads\glide-text2im\glide_text2im\clip\model_creation.py in image_embeddings(self, images, t)
47
48 def image_embeddings(self, images: torch.Tensor, t: torch.Tensor) -> torch.Tensor:
---> 49 z_i = self.image_encoder((images + 1) * 127.5, t)
50 return z_i / (torch.linalg.norm(z_i, dim=-1, keepdim=True) + 1e-12)
51
~\.conda\envs\glide-text2im\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
c:\users\alf\downloads\glide-text2im\glide_text2im\clip\encoders.py in forward(self, image, timesteps, return_probe_features)
483 ) -> torch.Tensor:
484 n_batch = image.shape[0]
--> 485 h = self.blocks["input"](image, t=timesteps)
486
487 for i in range(self.n_xf_blocks):
~\.conda\envs\glide-text2im\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
c:\users\alf\downloads\glide-text2im\glide_text2im\clip\encoders.py in forward(self, x, t)
124 self.pred_state[None, None].expand(x.shape[0], -1, -1)
125 if self.n_timestep == 0
--> 126 else F.embedding(cast(torch.Tensor, t), self.w_t)[:, None]
127 )
128 x = torch.cat((sot, x), dim=1) + self.w_pos[None]
~\.conda\envs\glide-text2im\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1850 # remove once script supports set_grad_enabled
1851 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1852 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1853
1854
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead (while checking arguments for embedding)
Can anyone help?
Thanks!
Not sure what is happening here, but you should try to aim for a GPU if possible.
See the comment in the notebook:
# This notebook supports both CPU and GPU.
# On CPU, generating one sample may take on the order of 20 minutes.
# On a GPU, it should be under a minute.
CPU mode takes 20 times more computation time than GPU mode.
I know, but my current GPU doesn't have enough VRAM... that's why I was running in CPU mode.
In my case I'm getting a new GPU soon, but think it would still be cool if this could still work on CPU...
Yes, sure. In the meantime, try to use a free GPU on Google Colab.
@woctezuma I finally got hold of a new GPU with 6 GB VRAM... so I am now running again the clip_guided notebook in GPU mode, but I am seeing exactly the same error I documented above...
Thanks! I saw them already but I don't have the necessary ML & rel. libs knowledge to properly make use of them...
I also already tried kind of blindly playing with those types and their conversion, but without success...
Honestly, I see it very hard I can come up with something useful just by myself... 🤷♂️
It could be just a simple change of this line:
glide-text2im/glide_text2im/clip/encoders.py
Lines 123 to 127 in 9cc8e56
You could try to replace:
F.embedding(cast(torch.Tensor, t), self.w_t)
with either:
F.embedding(cast(torch.Tensor, t.long()), self.w_t)
or:
F.embedding(cast(torch.Tensor, t).long(), self.w_t)
Ok, thanks! Now at least in CPU mode it works!
In GPU mode a completely black image is generated (at some points tensors become NaN), but I'll open another thread for that, as it must be caused by a different problem.