arenasys/stable-diffusion-webui-model-toolkit

Feature request

Closed this issue · 3 comments

It seems that you're the right project to ask for such a thing.

I have like a terrabyte of SDXL checkpoints. Sometime I wish to test one prompt with all of them via x/y/z prompt. It takes an hour.

I noticed that some models tend to load faster if you do it in the right order - since they share some chunks of data (e.g. one is in fact a parent of another). So there is a partial tree here, or perhaps a DAG.

The feature would be to pick up my collection of checkpoints, scan them and propose an optimal ordering of those elements in the sense of minimal loading time. It could return the result in csv format so that I can just paste it into X/Y/Z plot form.

Or perhaps there is another, smarter way of running my prompt on multiple checkpoints. A one that doesn't depend that much on reading large amounts of data from disk.

cse84 commented

If you have that many checkpoints, the simplest solution would be to invest in a fast SSD (at 3 GB/s read spead, reading a terabyte should take 333 seconds, not an hour). Reading a checkpoint can't be avoided if you want to test it in any way. The only way to make reading faster would be to extract the Unets and quantize them, but that might also reduce their quality.

I already did that since. Still, this is a computational complexity issue, which, as I was taught, we solve by improving algorithms, not upgrading the hardware. Otherwise, a year later someone comes and says he's having 10 or 100 TB of data to read and this size of SSD disks is currently outside a consumer's range. I see that we are missing the observation I made on the order of loading affecting the load time. I also noticed that some models, like realvisxlV50 tend to yield different results (different images) if I load them after specific other models (in this case realvisxlV20). I don't know why this happens, is that expected or maybe it's an issue with my configuration, but it should either be eliminated or controlled. Surely if some data stays in memory after unloading the model, it may also affect performance.

Checkpoints are just files, you will need to load them all from the disk if you want to use them all. They dont share data, each checkpoint file is self contained.

Behavior like: yield different results (different images) if I load them after specific other models is due to bugs/poorly written code in the interface you are using (a1111 etc) and corrupted checkpoint files (missing data). Varying loading times can be due to file caching (at OS or hardware level), corrupted checkpoints, etc.