FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
PythonMIT
Issues
- 0
Conflicting Instructions for computing FID
#86 opened by Kumbong - 2
FID of VQVAE
#85 opened by RohollahHS - 0
generate images with arbitrary resolutions
#84 opened by Leiii-Cao - 1
How can I train VAR with my own data? Does the num_class parameter in VAR have any effect?
#82 opened by zhangqingwu - 1
- 1
About in-painting tasks
#74 opened by zhiyuanyou - 0
class token replaces first token
#81 opened by cyw-3d - 1
换个角度,ms codebbok是不是也可以等价于另一种latent diffusion?
#72 opened by YilanWang - 3
How many GPU did you use for training?
#44 opened by daixiangzi - 6
Can not align FID with provided checkpoint
#69 opened by LiCHH - 0
- 1
How can i train this model by custom dataset
#79 opened by morestart - 2
Weird/Inconsistency evaluation IS score.
#77 opened by ChenDRAG - 1
Image reconstruction via Transformer.
#55 opened by minimini-1 - 1
Why multi-scale features partially shared a convolution network via PhiPartiallyShared
#73 opened by sunset-clouds - 0
Usage of classifier-free guidance
#76 opened by ChenDRAG - 3
we use VAR-CLIP train a TextToImage model on ImageNet,The result seems pretty good.
#75 opened by daixiangzi - 2
T2I Generation
#64 opened by ucasyjz - 0
- 1
https://var.vision/demo is borked
#68 opened by yosun - 0
for in/out painting
#70 opened by Youngwoo-git - 4
Question about the cross-antropy loss average?
#53 opened by Yheechou - 2
The 512 ckp?
#57 opened by FanqingM - 1
the size of latent space
#65 opened by xinding64 - 1
Finetuning on own data
#66 opened by LetsGoFir - 1
question about ema_vocab_hit_SV
#67 opened by shliu0 - 0
VQVAE training code
#63 opened by Junda24 - 0
the patch_nums of 256*256 image
#62 opened by xinding64 - 2
- 0
Training code for VQVAE
#60 opened by zhangjingze21 - 0
There was no increase in speed after installing flash-attn and xformer.
#59 opened by kongwanbianjinyu - 2
Inference after training on own dataset
#58 opened by moeinheidari7829 - 2
training code for VQVAE
#49 opened by Junda24 - 2
Question on autoregressive_infer_cfg
#56 opened by sparse-mvs-2 - 2
请问下训练512x512分辨率的图像也使用16x16的codebook size吗
#54 opened by YilanWang - 3
请问下两个阶段ablation的细节
#50 opened by YilanWang - 1
大佬 关于demo生成的图像的问题
#52 opened by BigConsin - 0
Scalability to multimodal large language models?
#51 opened by DEBIHOOD - 2
training with multi-gpu but stuck
#47 opened by Erisura - 0
Can the training log be released?
#48 opened by daixiangzi - 4
FID misalignment
#45 opened by ckczzj - 1
class VQVAE forward function error
#46 opened by woldier - 3
请问VQVAE(stage1)阶段是怎样使用多级VectorQuantizer的?
#42 opened by YilanWang - 2
AdaLNSelfAttn.forward 请问下这句目的是什么
#43 opened by yanghu819 - 2
Abnormal sample results with `demo_sample.ipynb`
#41 opened by karrykkk - 1
- 5
AR Time Complexity?
#35 opened by isaacrob - 1
- 3
About progressive training
#37 opened by ParanoidHW - 2