Issues
- 6
Clarification of speed
#45 opened by zehongs - 10
About VAE channels
#56 opened by pokameng - 6
Questions about causal methods
#27 opened by Tom-zgt - 2
About the positional encodings of diffusion MLP.
#54 opened by whwjdqls - 5
- 1
The influence of VAE feature dim
#53 opened by Tom-zgt - 2
About training with cached vae latents
#52 opened by RohollahHS - 3
Challenges in Memorizing Single or Few Images
#49 opened by kifarid - 2
About Encoder and Decoder in MAE
#42 opened by RohollahHS - 31
About Train
#36 opened by pokameng - 5
Faster training with fp16 or bf16
#29 opened by shaochenze - 3
Request for Causal AR Version Release
#39 opened by aengusng8 - 1
- 18
Reproducing the BASE model.
#48 opened by cxxgtxy - 5
model and training code for the AR variant
#34 opened by MikeWangWZHL - 2
Dataset used for training...
#50 opened by Nobody-Zhang - 2
Reconstruction loss in ELBO
#51 opened by Paulmzr - 1
Small Issue/error in the code
#47 opened by niklasbubeck - 14
Add HF integration to MAR
#32 opened by jadechoghari - 1
About masking ratio
#43 opened by RohollahHS - 1
Training Problems
#44 opened by drx-code - 2
Unable to calculate FID
#41 opened by xbyym - 8
Doesn't work well in speech generation task.
#35 opened by FacePoluke - 5
Difference Between MAR and MAGE
#30 opened by JeremyCJM - 2
CFG for cross-entropy
#38 opened by shaochenze - 4
About Training Loss
#37 opened by Ferry1231 - 1
Is Autoencoder ok?
#33 opened by Ferry1231 - 9
The CFG strategy - linear. vs constant
#31 opened by yuhuUSTC - 18
Question on the Value of Training Loss for DiffuLoss with MAR and Causal Methods
#20 opened by bugWholesaler - 3
- 4
generate images with arbitrary resolutions,
#28 opened by Leiii-Cao - 1
Inference details of an ablation experiment.
#26 opened by tgxs002 - 2
VAE decoded as NaN in early stages of training
#25 opened by xiazhi1 - 16
Train Code for VAE Used in Paper
#19 opened by Ferry1231 - 9
Buffer Size for Class Condition
#21 opened by zhuole1025 - 18
FID evaluation reference data
#8 opened by MArSha1147 - 2
Why main_cache do not use flip augment?
#24 opened by xiazhi1 - 2
How should inference be performed when using VQ-16 (discrete)? During decoding, should we use the AR output for VQ and then decode?
#23 opened by Tom-zgt - 4
About the mask schedule during training
#22 opened by zythenoob - 3
MAR for Image-to-Image Generation
#18 opened by Bili-Sakura - 2
Training settings for MAR series
#16 opened by HuangOwen - 4
Latent Dimensions of VAE
#15 opened by Vinnieassaulter - 1
Loss for training KL-VAE
#13 opened by Vinnieassaulter - 4
- 2
Training epochs
#12 opened by sihyun-yu - 4
- 2
Why not whole DIT block?
#10 opened by WeitaoLu - 2
- 4
Generation FID is much lower than Reconstruction FID for models using VQ-16 (discrete) provided by LDM codebase
#6 opened by ShiFengyuan1999 - 2
The Impact of MLP Depth
#5 opened by Robertwyq