A Theoretical Question

Question

A Theoretical Question

Closed this issue a year ago · 15 comments

In the diffusion model papers, we all assume the real image $\textbf{x}_0 \sim q(\textbf{x}_0)$, but I haven't seen an exact definition of $q(\textbf{x}_0)$. So I wonder what exactly $q(\textbf{x}_0)$ is. Is it the distribution function of $\textbf{x}_0$? If so, how do we calculate the distribution function of a single image? Thank you!

forever208 commented a year ago

yes

👍1

Answer 1 · 2023-05-25T14:55:20.000Z

q(x_0) stands for the whole data distribution, i.e your training dataset.
We can not explicitly express the correct q(x_0) (we do not know if q(x_0) is Gaussian or other distribution), we can only draw samples from the data distribution q(x_0)

Answer 2 · 2023-05-25T15:19:21.000Z

I see, thank you! And in $\textbf{x}_t \sim q(\textbf{x}_t|\textbf{x}_{t-1})=\mathcal{N}(\textbf{x}_t;\sqrt{1-\beta_t}\textbf{x}_{t-1},\beta_t \textbf{I})$, the function $q(\textbf{x}_t|\textbf{x}_{t-1})$ is the conditional distribution of $x_t$ given $x_{t-1}$. Am I right? Thanks.

Answer 3 · 2023-05-25T16:01:09.000Z

Thank you. Also, for example, I’m working on cifar10 dataset. Then the dimension of $\textbf{x}_0, \cdots, \textbf{x}_t$ is 32×32×3, right?

Answer 4 · 2023-05-26T19:18:01.000Z

Thank you. Also, for example, I’m working on cifar10 dataset. Then the dimension of x0,⋯,xt is 32×32×3, right?

yes

Answer 5 · 2023-05-26T19:45:41.000Z

Thank you! I wonder what you mean by "each FID value is computed using T = 1000 sampling steps". Does it imply diffusion_steps 1000 in the code below? Thanks.

mpirun python scripts/image_sample.py \
--image_size 32 --timestep_respacing 100 \
--model_path PATH_TO_CHECKPOINT \
--num_channels 128 --num_head_channels 32 --num_res_blocks 3 --attention_resolutions 16,8 \
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.3 \
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000

In Figure 3 of your paper, you calculated FID scores using T = 1000 sampling steps.

Answer 6 · 2023-05-26T21:01:22.000Z

Thank you! I wonder what you mean by "each FID value is computed using T = 1000 sampling steps". Does it imply diffusion_steps 1000 in the code below? Thanks.
mpirun python scripts/image_sample.py \
--image_size 32 --timestep_respacing 100 \
--model_path PATH_TO_CHECKPOINT \
--num_channels 128 --num_head_channels 32 --num_res_blocks 3 --attention_resolutions 16,8 \
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.3 \
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000
In Figure 3 of your paper, you calculated FID scores using T = 1000 sampling steps.

Answer 7 · 2023-05-26T21:03:27.000Z

Thank you! I wonder what you mean by "each FID value is computed using T = 1000 sampling steps". Does it imply diffusion_steps 1000 in the code below? Thanks.
mpirun python scripts/image_sample.py \
--image_size 32 --timestep_respacing 100 \
--model_path PATH_TO_CHECKPOINT \
--num_channels 128 --num_head_channels 32 --num_res_blocks 3 --attention_resolutions 16,8 \
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.3 \
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000
In Figure 3 of your paper, you calculated FID scores using T = 1000 sampling steps.
The code uses 100 sampling steps. Figure 3 will be updated in paper later

Answer 8 · 2023-05-26T21:08:06.000Z

Got it. Could you tell me which parameter determines the number of sampling steps in the code below? Thank you.

mpirun python scripts/image_sample.py \
--image_size 32 --timestep_respacing 100 \
--model_path PATH_TO_CHECKPOINT \
--num_channels 128 --num_head_channels 32 --num_res_blocks 3 --attention_resolutions 16,8 \
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.3 \
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000

Answer 9 · 2023-05-26T21:09:33.000Z

Got it. Could you tell me which parameter determines the number of sampling steps in the code below? Thank you.

mpirun python scripts/image_sample.py \
--image_size 32 --timestep_respacing 100 \
--model_path PATH_TO_CHECKPOINT \
--num_channels 128 --num_head_channels 32 --num_res_blocks 3 --attention_resolutions 16,8 \
--resblock_updown True --use_new_attention_order True --learn_sigma True --dropout 0.3 \
--diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True --batch_size 256 --num_samples 50000

--timestep_respacing 100

Answer 10 · 2023-05-26T21:21:03.000Z

Thank you. I'm still a little confused about the notations in the paper. You mentioned in the paper "When training, we always use $T = 1000$ steps for all the models. At inference time, the results reported with $T^{\prime} < T$ sampling steps have been obtained using the respacing technique." So in here $T = 1000$ refers to diffusion_steps 1000, and $T^{\prime}$ refers to the parameter timestep_respacing. Am I right? Thanks.

Answer 11 · 2023-05-26T21:32:09.000Z

Thank you. I'm still a little confused about the notations in the paper. You mentioned in the paper "When training, we always use T=1000 steps for all the models. At inference time, the results reported with T′<T sampling steps have been obtained using the respacing technique." So in here T=1000 refers to diffusion_steps 1000, and T′ refers to the parameter timestep_respacing. Am I right? Thanks.

yes

Answer 12 · 2023-05-26T22:02:03.000Z

Thanks! By the way, I'm trying to train a new model for MNIST dataset using your code and I've created a notebook to download MNIST dataset and transform the training dataset to npz file. The only issue I have is that the dimension of the images in my npz file is 28×28×3, but the default dimension of MNIST images is 28×28×1.

So I wonder if this discrepancy will influence the training of DDPM-IP. Here is the Colab notebook. Thank you.

The dimension of my npz file:

The default dimension:

Answer 13 · 2023-05-27T19:24:44.000Z

Hi, I also wonder what the number of total_batch_size is when you were training on the Celeba dataset. I guess total_batch_size is 8*16=128 since there are two nodes. And how long does the training process take? Thank you.

The code for CelebA 64x64 training:

mpiexec -n 16  python scripts/image_train.py --input_pertub 0.1 \
--data_dir PATH_TO_DATASET \
--image_size 64 --use_fp16 True --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 16

Answer 14 · 2023-06-06T15:14:50.000Z

Hi, I also wonder what the number of total_batch_size is when you were training on the Celeba dataset. I guess total_batch_size is 8*16=128 since there are two nodes. And how long does the training process take? Thank you.

The code for CelebA 64x64 training:
mpiexec -n 16  python scripts/image_train.py --input_pertub 0.1 \
--data_dir PATH_TO_DATASET \
--image_size 64 --use_fp16 True --num_channels 192 --num_head_channels 64 --num_res_blocks 3 \
--attention_resolutions 32,16,8 --resblock_updown True --use_new_attention_order True \
--learn_sigma True --dropout 0.1 --diffusion_steps 1000 --noise_schedule cosine --use_scale_shift_norm True \
--rescale_learned_sigmas True --schedule_sampler loss-second-moment --lr 1e-4 --batch_size 16

total_batch_size is 8*16=128 is correct. training celeba takes 4-5 days using 16 V100 GPUs