lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
PythonMIT
Issues
- 0
todo
#29 opened by lucidrains - 75
Image Generation
#22 opened by taeinkwon - 1
issue with checkpoint saving
#30 opened by mijanur132 - 4
Generate text description of image prompt
#27 opened by mijanur132 - 1
Unsure of the functionality of one line
#28 opened by Ivan-Zhong - 3
Any idea about image caption implementation.
#26 opened by Ma-Weijian - 3
Example of using VAE for image.
#20 opened by dingkwang - 2
- 4
Default times are multiplied by batch size
#23 opened by RefractAI - 3
Issue with "decoding text" not finishing
#21 opened by siyuan5 - 15
Why are you eliminating the influence of positional information on modalities using RoPE?
#19 opened by shin-wn - 1
modality_length_to_times_fn always default
#18 opened by RefractAI - 4
[BUG] Fail to run the test samples
#17 opened by Masaaki-75 - 1
- 0
omnigen plz
#15 opened by af-74413592 - 1
Could you prepare a reasoning demo?
#13 opened by win10ogod - 6
Got error when running example
#12 opened by Bing1002 - 2
text layernorm
#11 opened by cliangyu - 1
Question about Diffusion Loss
#8 opened by JJJYmmm - 3
Bug: modality_token_transform is empty?
#6 opened by GindaChen - 1
Is there any pretrained model weight?
#5 opened by chenfengshijie - 1
Question
#1 opened by zhang-haojie - 1
Batch size >1 not working, and loss queries
#3 opened by RefractAI - 4