different setup of input_hint_block compared to paper?
liren-jin opened this issue · 1 comments
Hi, i noticed that the implementation of the tiny work converting control images into feature space is different from the structure menioned in the paper: "In particular, we use a tiny network E(·) of four convolution layers with 4 × 4 kernels and 2 × 2 strides (activated by ReLU, using 16, 32, 64, 128, channels respectively". The corresponding implementation should be here right(correct me if i am wrong):
Lines 147 to 163 in ed85cd1
a tiny network E(·)
Hello, have you figured out your question? Since I’m not very familiar with ControlNet, I haven’t fully understood the code yet. This code corresponds to the tiny network 𝐸(⋅) in the paper, which converts the condition image into conditional features, right? I hope you can help me clarify it. Thank you very much!