Question for the encoder_hidden_states

Question

Question for the encoder_hidden_states

WayneML opened this issue 10 months ago · 4 comments

When I try to run the script, I found the encoder_hidden_states to be zero.

Answer 1 · 2024-01-18T00:37:09.000Z

if args.conditioning_dropout_prob is not None:
random_p = torch.rand(
bsz, device=latents.device, generator=generator)
# Sample masks for the edit prompts.
prompt_mask = random_p < 2 * args.conditioning_dropout_prob
prompt_mask = prompt_mask.reshape(bsz, 1, 1)
# Final text conditioning.
null_conditioning = torch.zeros_like(encoder_hidden_states)
encoder_hidden_states = torch.where(
prompt_mask, null_conditioning.unsqueeze(1), encoder_hidden_states.unsqueeze(1))

I found something strange in this code block，it seems that “random_p = torch.ran(bsz, device=latents.device, generator=generator)” always make random_p is one dimension and the value is 1.when you chose batch size is 1.
make prompt_mask one ture but not a list of Boolean type.
prompt_mask = random_p < 2 * args.conditioning_dropout_prob
prompt_mask = prompt_mask.reshape(bsz, 1, 1)
# Final text conditioning.
null_conditioning = torch.zeros_like(encoder_hidden_states)
encoder_hidden_states = torch.where(
prompt_mask, null_conditioning.unsqueeze(1), encoder_hidden_states.unsqueeze(1))

Answer 2 · 2024-01-18T00:38:03.000Z

And is this still for image2video task? It seems that it is used for the text to image.

Answer 3 · 2024-01-20T05:04:15.000Z

Hi, I didn't quite understand what you meant. Are you asking why the encoder_hidden_states need to be replaced with zeros?

Answer 4 · 2024-02-07T04:15:36.000Z

Can the encoder_hidden_states be replaced with a text embedding for text-to-video tasks？