How is Guide Image being passed into controlnet ?

Question

How is Guide Image being passed into controlnet ?

venkateshtata opened this issue 8 months ago · 2 comments

generator = torch.Generator(device=accelerator.device).manual_seed(args.seed)
images = []
for _ in range(args.num_validation_images):
with torch.no_grad():
try:
batch = next(val_iter)
except:
val_iter = iter(val_dataloader)
batch = next(val_iter)
target = batch["pixel_values"].to(dtype=weight_dtype)
guide = batch["guide_values"].to(accelerator.device)
_ = control_lora(guide).control_states
image = pipeline(args.validation_prompt, num_inference_steps=30, generator=generator).images[0]
image = dataset_cls.cat_input(image, target, guide)
images.append(image)

In the above code snippet, how is the guide image being passed as input ? As i see only the validation_prompt being passed with the initialized Generator.
Also, does the ControlLora model use pre-trained diffusion model weights before being injected with LoRA Adapter ?

Answer 1 · 2024-04-07T01:48:58.000Z

generator = torch.Generator(device=accelerator.device).manual_seed(args.seed) images = [] for _ in range(args.num_validation_images): with torch.no_grad(): try: batch = next(val_iter) except: val_iter = iter(val_dataloader) batch = next(val_iter) target = batch["pixel_values"].to(dtype=weight_dtype) guide = batch["guide_values"].to(accelerator.device) _ = control_lora(guide).control_states image = pipeline(args.validation_prompt, num_inference_steps=30, generator=generator).images[0] image = dataset_cls.cat_input(image, target, guide) images.append(image)

In the above code snippet, how is the guide image being passed as input ? As i see only the validation_prompt being passed with the initialized Generator.

Also, does the ControlLora model use pre-trained diffusion model weights before being injected with LoRA Adapter ?

When I wrote the code for controllora, the pipeline of controlnet had not yet appeared in diffusers, so I saved hidden_states as a temporary variable in attention lora through the process of calling control_lora(guide). This temporary variable will be used when attention is called.
In this version, the control part uses a series of downsampling convolutions to extract features, and then injects these features into lora. If you use my control-lora-v2, it will use a controlnet-like approach, using pretrained diffusion weights, and use lora to adjust the weight of the control part.

Answer 2 · 2024-04-07T09:05:12.000Z

Thank you for the clarification!