I am try to run this model with ViedoLQ dataset. But StableVSR occur OOM in raft_large model.

Question

I am try to run this model with ViedoLQ dataset. But StableVSR occur OOM in raft_large model.

DaramGC opened this issue 2 months ago · 2 comments

How can I shrink memory usage. Thank you.

Error Message

/home/hjh9902/.conda/envs/stablevsr/lib/python3.8/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Loading pipeline components...: 100%|██████████| 6/6 [00:26<00:00,  4.34s/it]
You have disabled the safety checker for <class 'pipeline.stablevsr_pipeline.StableVSRPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
/home/hjh9902/.conda/envs/stablevsr/lib/python3.8/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "test.py", line 64, in <module>
    frames = pipeline('', frames, num_inference_steps=args.num_inference_steps, guidance_scale=0, of_model=of_model).images
  File "/home/hjh9902/.conda/envs/stablevsr/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/hjh9902/evaluation/models/StableVSR/pipeline/stablevsr_pipeline.py", line 962, in __call__
    forward_flows, backward_flows = self.compute_flows(of_model, upscaled_images, rescale_factor=of_rescale_factor)
  File "/home/hjh9902/evaluation/models/StableVSR/pipeline/stablevsr_pipeline.py", line 712, in compute_flows
    bflow = of.get_flow(of_model, prev_image, cur_image, rescale_factor=rescale_factor)
  File "/home/hjh9902/evaluation/models/StableVSR/util/flow_utils.py", line 43, in get_flow
    flows = of_model(target, source)
  File "/home/hjh9902/.conda/envs/stablevsr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hjh9902/.conda/envs/stablevsr/lib/python3.8/site-packages/torchvision/models/optical_flow/raft.py", line 484, in forward
    self.corr_block.build_pyramid(fmap1, fmap2)
  File "/home/hjh9902/.conda/envs/stablevsr/lib/python3.8/site-packages/torchvision/models/optical_flow/raft.py", line 372, in build_pyramid
    corr_volume = self._compute_corr_volume(fmap1, fmap2)
  File "/home/hjh9902/.conda/envs/stablevsr/lib/python3.8/site-packages/torchvision/models/optical_flow/raft.py", line 416, in _compute_corr_volume
    corr = torch.matmul(fmap1.transpose(1, 2), fmap2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 21.97 GiB (GPU 0; 79.19 GiB total capacity; 39.70 GiB already allocated; 21.95 GiB free; 55.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

DaramGC commented 2 months ago

Thx!

Answer 1 · 2024-10-01T12:59:31.000Z

Hi, I also experienced this issue with raft large when upscaling bigger frames. You can try with
raft_small instead of raft_large. See the documentation here. Even if the large version was used during training, it shouldn't change much.