zhengchen1999/RGT

Comparison to DAT?

Opened this issue · 5 comments

Hey, thank you for your work. I had just a quick question.

I didnt look (or understand) into the technical details too much, I am writing simply from the practical/applying side (tried training some models).

Performance wise on an application standpoint RGT feels fairly similar to DAT, meaning training and inference speed wise and output quality wise, just to me personally.

I also just recently did an inference speed test, and RGT got very similar speeds to DAT (and RGT-S to DAT-S)

I simply wanted to ask on the theoretical side, if there maybe would be a usecase where DAT would be preferable (/should perform better) over RGT or RGT preferable over DAT?

Ah PS here some outputs / examples of an RGT model I recently trained:

Slowpoke Pics 6 Examples

4xRealWebPhoto_RGT_250k_comparison

4xRealWebPhoto_RGT_250k_comparison_2

---- mentioned inference speed test:

4x Inference speed test, neosr testscripts, 50 256x256 images as input, Ubuntu (Budgie) 23.10, GeForce RTX 3060 Lite Hash Rate, AMD Ryzen™ 5 3600 × 12

Sorted network option according to fastest run out of 3 runs each (first started with more, so compact and cugan had more, but then hit dat and from then on 3 each):

Compact: 1.90s, 26.35fps
SPAN: 2.33s, 21.44fps
SAFMN: 2.51s, 19.89fps
DITN: 4.26s, 11.72fps
CUGAN: 4.45s, 11.22fps
OmniSR: 8.90s, 5.62fps
SAFMN-L: 9.87s, 5.07fps
CRAFT: 11.26s, 4.44fps
DCTLSA: 11.53s, 4.43fps
SwinIR-S: 14.18s, 3.53fps
SRFormer-light: 16.28s, 3.07fps
ESRGAN: 22.51s, 2.22fps
SwinIR-M: 46.46s, 1.08fps
HAT-S: 71.37s, 0.70fps
RGT_S: 74.83s, 0.67fps
DAT-S: 74.96s, 0.67fps
SRFormer-M: 79.02s, 0.63fps
DAT2: 81.90s, 0.61fps
HAT-M: 90.19s, 0.55fps
RGT: 96.07s, 0.52fps
DAT: 97.08s, 0.52fps
HAT-L: 177.75s, 0.28fps

Hi,

Thank you for your detailed testing. The latency of RGT and DAT is close, which aligns with the calculated results of Params and FLOPs. However, RGT performs better, for instance, on the Urban100 dataset, with the following specific comparisons:

Urban100 Scale-x2 Scale-x4
DAT-S 34.12 27.68
RGT-S 34.32 27.89
DAT 34.37 27.87
RGT 34.47 27.98

This is because, unlike DAT, which employs channel attention to achieve linear complexity, RGT adopts linear global spatial attention, which is more suited for SR tasks.

Sound good, thank you for your work :)

I was able to train an RGT and an RGT-S model. Results look good. My latest RGT-S one can be found here.
In this pdf I wrote down my approach and degradation workflow for this model.

Its called '4xRealWebPhoto_v2_rgt_s' since its idea was to upscale downloaded photos from the web, so i modeled the degradations of the dataset to include scaling and compression and rescaling and recompression (by a service provider like the user uploading to social media or something, then someone else downloading and re-uploading again. Also added realistic noise and some slight lens blur).

12 slowpic examples of the output of the model in this link

And below simply some 4 examples
4xRealWebPhoto_v2_rgt_s_ex1
4xRealWebPhoto_v2_rgt_s_ex2
4xRealWebPhoto_v2_rgt_s_ex4
4xRealWebPhoto_v2_rgt_s_ex3

@Phhofm Hi. I was wondering if you could create a tutorial on how to further train models. I have an RTX 3090 graphics card with 24GB VRAM and 32GB of RAM, and I'd like to train a model for the first time. By the way, I've been using open-source projects from GitHub for 4-5 years now, so I believe with a little help I can get the hang of it. Thanks a lot!

Hey, you could definetly get into training your own upscaling models.
You can find a few links in this readme that could help get you started: training-info
neosr is what i have been using for training
I might suggest starting training a compact model first, that helps you gathering experience because for it you will have to download a dataset, then use otf or degrade it, have the config correct, and have dependencies and so forth.
Once you started training a compact model with default config values on a standard dataset and you see validation output be generated, then you should have gathered enough experience to be able to make/degrade your own dataset, tinker around with config values, go with a bigger transformer arch like this one. Or this is how I started basically and I think its a good way.
Maybe good if you join the linked discord community in the readme, and ask questions there, because there are more people that can answer your questions or errors you run into. But yeah, the linked readme should get you started with ressources to start looking into

@Phhofm Would you mind updating the link to the Google Drive folder with new models in your "models" GitHub repository? I've successfully used some of them in Hybrid, a video editing program. Thanks!