Precompute modulation outputs and offload related parameters
Closed this issue · 1 comments
atiorh commented
Draw Things realized that modulation parameters can be computed prior to the diffusion/denoising loop and the related parameters can be offloaded early.
Expected impact for FLUX.1[Schnell] is 6.5GB peak memory reduction by offloading 3.25b (float16) parameters.
In [1]: round(sum([p.numel() for p in self.parameters()]) / 1e9, 2)
Out[1]: 11.89
In [2]: round(sum([p.numel() for n,p in self.named_parameters() if "mod" not in n]) / 1e9, 2)
Out[2]: 8.64