lucidrains/recurrent-interface-network-pytorch

follow up paper

Opened this issue ยท 8 comments

recently ran into a researcher who told me there is a follow up paper to this work

does anyone know of it?

It could be this one https://arxiv.org/pdf/2405.20324 (Nicolas Dufour et. al, CVPR 2024) which has extended the RIN into text condition.

@StevenLiuWen very cool! and not the original author(s)!

@StevenLiuWen very cool! and not the original author(s)!

Also, another work, PointInfinity (https://arxiv.org/pdf/2404.03566) applied it to the 3D point cloud generation. RIN or perceiver-io style architecture has a nice property for handling high-resolution data. Looking forward to their more potential applications.

indeed, thank you!

This paper is a direct extension from one of the authors (Ting Chen):

FIT: Far-reaching Interleaved Transformers
Ting Chen, Lala Li
https://arxiv.org/abs/2305.12689

Only skimmed it, but it looks like they just add local self-attention layers to the data branch of RIN. A bit hard to interpret their diffusion results because they only report MSE. It seems reasonable that local self-attention over the pixels would help though.

@justinlovelace that's an interesting paper too! ๐Ÿ™

This paper is a direct extension from one of the authors (Ting Chen):

FIT: Far-reaching Interleaved Transformers Ting Chen, Lala Li https://arxiv.org/abs/2305.12689

Only skimmed it, but it looks like they just add local self-attention layers to the data branch of RIN. A bit hard to interpret their diffusion results because they only report MSE. It seems reasonable that local self-attention over the pixels would help though.

probably using a NATTEN on the data branch would work even better

Has anyone tried this purely on text?
I am currently working to adapt it for text, so it would vaguely be a "diffusion language model" but wanted to know if there are any similar works or negative results from folks who have tried it already (cc: @justinlovelace would be interested in your thoughts/opinions).