dbolya/tomesd

Analytical Results

sahilsakhuja opened this issue · 1 comments

Hi,

First off - thanks for the innovative work that you have put into this technology.

Would like to share that I conducted a small analysis of the performance of Token Merging. I have shared a detailed report on the same below:
Token Merging: An Analysis

In a nutshell, while the proposed technique of Token Merging is definitely very impactful in terms of efficiency - I would recommend to employ different testing techniques which can mirror real-life use cases, since, at higher levels of token merging, the resultant images change significantly. Hence, as a guidance - it might be prudent to recommend using less than 50% token merging ratios.

Please do review the same and I hope this adds value to your future efforts!

dbolya commented

Thanks for the write-up! I took a look, and your results look worse than I was able to get. But, that might just be a difference in prompts, since I was using animal photos and such. I would agree with your conclusion though: probably want to keep the merging ratio 50% or less if applying ToMe in the first pass. Interesting that you found longer prompts actually resulted in better images with ToMe!

One thing I want to add though is that the first pass is usually pretty quick anyway. What ToMe is really useful for is during the second pass when generating really high res images, where it can really cut down on evaluation time with much less impact on quality.