argmaxinc/DiffusionKit

Precompute modulation outputs and offload related parameters

Closed this issue · 1 comments

Draw Things realized that modulation parameters can be computed prior to the diffusion/denoising loop and the related parameters can be offloaded early.

Expected impact for FLUX.1[Schnell] is 6.5GB peak memory reduction by offloading 3.25b (float16) parameters.

In [1]: round(sum([p.numel() for p in self.parameters()]) / 1e9, 2)
Out[1]: 11.89

In [2]: round(sum([p.numel() for n,p in self.named_parameters() if "mod" not in n]) / 1e9, 2)
Out[2]: 8.64

Implemented in #14