rentruewang/koila

Using Koila with Big Sleep?

illtellyoulater opened this issue ยท 5 comments

Hi, this project could be revolutionary, if only I knew how to use it :)

You surely heard of Big Sleep, right? Using CLIP and BIGGAN, from just a line of text it's capable of generating amazing visuals and unique works of art, which is why is getting more and more popular among an ever growing number of artists and curious people who have been deeply fascinated by the potential of these techniques...

However many of us have not been able to run these kind of projects on our machines because of low VRAM in consumer GPUs and crazy market prices and ended up stumbling almost immediately on the infamous CUDA Memory Error... (Yes, Google Colab is nice and all, but running this projects locally makes for a totally different kind of "technological chill" if you know what I mean :) )

So, I was thinking, would it be possible to apply Koila to Big Sleep, to fix those errors?
If so, that'd be a game changer! It would at the same time benefit a huge number of users, and translate into massive traction for Koila!
Looking at the README I thought the whole process would have been very simple so I tried looking at it myself... but in the end I had to give up because I've just approached this field and I still miss much of the necessary background to figure out these kind of details.

So yeah, would you consider providing a short example for this use case of Koila + Big Sleep, if feasible? In that case just a few lines of code could potentially mean the beginning of a little revolution :)

Hi, thanks for the write up!

As a poor student, I feel you, the constant struggle to reduce memories so that our machines can handle it :(

As it stands, I could not make it work on Google Colab, which uses python3.7 (#7) and the library does not support a wide range of layers yet. In theory it could, but I haven't found the time to really implement all those interfaces. See #18 for the current status. Right now, it is more of a proof-of-concept rather than a productive ready library.

With that said, regarding your problem, since koila works as an automatic gradient accumulator (that slices up batches, calculates available memory, and the accumulates the gradients for you), have you tried using gradient accumulation to reduce memory usage yet? If not, it's a great way to lower memory foot print (on GPUs) for deep learning algorithms.

@rentruewang thanks for your reply! Now I understand things a little better.

I believe that Big Sleep is already using gradient accumulation.
In facts it provides a "--gradient_accumulate_every=" option right in the command line interface, which defaults to 4.

I tried increasing that number to much higher values, but it didn't help, and I always received "RuntimeError: CUDA out of memory".
So that probably means Koila won't help anyway in this situation, correct?

If so, well thanks anyway for your help... at least I learnt more about how Koila works and gradient accumulation, and probably I might be able to apply these concepts to other projects until Koila reaches a more mature state.

@illtellyoulater Actually, the gradient accumulation methods are quite different, so using koila in big sleep may not help. Let me explain.

Disclaimer: This analysis is my explanation after a quick read over big sleep's source code.

Big sleep works by passing in a list of text samples, using gradient descent to find the best images that minimize the 'distance' that CLIP finds, and performs gradient descent on that image to make that image closer to a target image that minimizes the CLIP distance.

Here's how gradient_accumulate_every is used in the repo:

https://github.com/lucidrains/big-sleep/blob/49b20f9c8169667395b68d1bbe169d28137fea8e/big_sleep/big_sleep.py#L449-L453

It is used to increase the norm of the gradients, to make the optimizer step bigger. I'm curious as to why he is using gradient accumulation here; I may be wrong, but since the inputs are the same, it's effectively scaling the gradients by an integer factor (there's not a need for gradient accumulation here).

Compare to that, koila works by splitting up a batch by

  1. Finds the batch size that fits into memory.
  2. Splits a batch if the batch size is larger than that threshold.
  3. Passes the splits through the model.
  4. Accumulates the gradients.

So you see, it works by reducing batch sizes in forward passes, but uses gradient accumulation to maintain the same effective batch size. That means it has the following deficiencies:

  1. When the batch size is already 1, then there's nothing to gain by using koila.
  2. If the author doesn't use batches, then koila doesn't work at all.

Hope this explanation makes sense!

Great info, thanks! Probably it could be useful to add this to the repository README!

Good idea. Thanks!