MCG-NJU/EMA-VFI

Be faster and better

Opened this issue · 1 comments

tomcup commented

Try not using itmm, then it'll be simple to turn it into C/C++. I find it very slow for the initiation, I don't know why. In the example, it seems that cpu is used, is there some special reason not using gpu? GPUs are used more in video processing usually.

I've tried the demo, --n=8 for 30s and --n=32 for 2min(No initialization time) on my 1650gpu. The average of 3.75s per frame is much higher than RIFE(VapourSynth-RIFE-ncnn-Vulkan: rife-v4.6 ensemble=True , I tried using it on a 2h 1080p film from 24 to 60, the whole process is about 14h, average of it is 0.12s per frame), why is that?

tomcup commented

I used screenshots of animated movies, and I don't expect this model to have a good effect on animation, but I want to see how it works against darkness and railings.

These are what I used:
mpv-shot0001
mpv-shot0002

But the GPU's performance surprised me:
image
I just python .\demo_Nx.py --n 3, and the gpu do like this?

Then I tried this:

import cv2
import sys
import torch

import numpy as np
from imageio import mimsave

sys.path.append(".")
import config as cfg
from benchmark.utils.padder import InputPadder

from model import feature_extractor, flow_estimation

I0 = cv2.imread("example/mpv-shot0001.jpg")
I2 = cv2.imread("example/mpv-shot0002.jpg")

I0_ = (torch.tensor(I0.transpose(2, 0, 1)).cuda() / 255.0).unsqueeze(0)
I2_ = (torch.tensor(I2.transpose(2, 0, 1)).cuda() / 255.0).unsqueeze(0)

padder = InputPadder(I0_.shape, divisor=32)
I0_, I2_ = padder.pad(I0_, I2_)

backbonetype, multiscaletype = (feature_extractor, flow_estimation)
# backbonecfg, multiscalecfg = cfg.init_model_config(F=16, depth=[2, 2, 2, 2, 2])
backbonecfg, multiscalecfg = cfg.init_model_config(F=32, depth=[2, 2, 2, 4, 4])
net = flow_estimation(feature_extractor(**backbonecfg), **multiscalecfg)

def convert(param):
    return {
        k.replace("module.", ""): v
        for k, v in param.items()
        if "module." in k and "attn_mask" not in k and "HW" not in k
    }

net.load_state_dict(convert(torch.load(f"ckpt/ours_t.pkl")))
net.eval()
net.to(torch.device("cuda"))

imgs = torch.cat((I0_, I2_), 1)
pred = net(imgs)

mid = (
    padder.unpad(pred)[0]
    .detach()
    .cpu()
    .numpy()
    .transpose(1, 2, 0)
    * 255.0
).astype(np.uint8)
mimsave("example/out_2x.jpg", [mid[:, :, ::-1]])

The result is more funny:

torch.cuda.OutOfMemoryError: CUDA out of memory. 
Tried to allocate 44.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. 
Of the allocated memory 9.82 GiB is allocated by PyTorch, and 186.06 MiB is reserved by PyTorch but unallocated. 
If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How can two ordinary 1080p movie screenshots run out of GPU memory?