kohya-ss/sd-scripts

[Feature Request]Use Gzip for compress latent file size

Opened this issue · 3 comments

sdbds commented

from reddit by joycaption auther:
https://old.reddit.com/r/StableDiffusion/comments/1gdkpqp/the_gory_details_of_finetuning_sdxl_for_40m/

Each image in the dataset is about 1MB, which means the dataset as a whole is nearly 7TB, making it infeasible for me to do training in the cloud where I can utilize larger machines. But once gzipped, the latents are only about 100KB each, 10% the size, dropping it to 725GB for the whole dataset. Much more manageable. (Note: I tried zstandard to see if it could compress further, but it resulted in worse compression ratios even at higher settings. Need to investigate.)

The latent is already saved in .npz, i.e. zip format. However, in my experience zipping latents has little effect. Therefore, I am a little skeptical as to how much the size will be reduced even if gzip is used.

Currently, the sd-scripts cache is saved in Numpy's .npz format with float32 or float16. I am considering changing this to .safetensors format and saving it as float16/bfloat16/float8_e4m3fn.

sdbds commented

The latent is already saved in .npz, i.e. zip format. However, in my experience zipping latents has little effect. Therefore, I am a little skeptical as to how much the size will be reduced even if gzip is used.

Currently, the sd-scripts cache is saved in Numpy's .npz format with float32 or float16. I am considering changing this to .safetensors format and saving it as float16/bfloat16/float8_e4m3fn.

image

Having said that, the compression ratio of NPZ is actually 100%, and I checked that the npy format decompressed to exactly the same size as NPZ.Using just zip or gzip compression can improve the compression ratio by reducing it to 25%, i.e. 1.6mb to about 400kb.The extreme compression of 7z on the other hand can be reduced to 20%, i.e. 300kb/1.6mb

Thank you! Sorry, I guess I misunderstood about .npz.

I think the reason why flux is compressed well is because it is converted from the bfloat16 format to float32 and saves it.

image

In my tests the compression ratio was rather poor, but it probably depends on the dataset.

The fp8 format still seems to have advantages. We might also consider .safetensors in combination with zip/gzip.