Release SD Pipeline VRAM from CUDA cache after generating samples

As of 46aee85, when sampling images during training, CUDA keeps the (unused) pipeline data cached on VRAM on method exit, possibly causing overcommit (8.5~8.9 / 8.0 on my case), which can slow down training, as well as other applications that are also using the graphics card due to constant VRAM<->RAM swapping

Unloading the pipeline and clearing CUDA cache by adding (before exiting sample_images)

del pipeline
torch.cuda.empty_cache()

Before

sd-scripts/library/train_util.py

Line 2359 in 46aee85

torch.set_rng_state(rng_state)

Should mitigate this issue and keep the VRAM usage (7.0~7.2 / 8.0 on my case) the same as it was before calling sample_images on method exit

Thank you for letting me to know. I will add these codes in next update.

Fixed in the latest commit :)