HolyWu/vs-rife

Install problems - solved

Samhayne opened this issue · 20 comments

Ran into 2 problems, installing the current vs-rife version (on Windows) today, that cost me some time to resolve and here are the solutions:

Problem 1:

Collecting tensorrt>=8.5.2.2 (from vsrife)
  Using cached tensorrt-8.6.1.post1.tar.gz (18 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Samhayne\AppData\Local\Temp\pip-install-2qleovil\tensorrt_1b55b028ef7f49f3b31b96c06fee5dd8\setup.py", line 103, in <module>
          if disable_internal_pip or nvidia_pip_index_url in parent_command_line() or nvidia_pip_index_url in pip_config_list():
                                                             ^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\Samhayne\AppData\Local\Temp\pip-install-2qleovil\tensorrt_1b55b028ef7f49f3b31b96c06fee5dd8\setup.py", line 96, in parent_command_line
          return subprocess.check_output(["ps", "-p", str(pid), "-o", "command", "--no-headers"]).decode()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "D:\Python\Python311\Lib\subprocess.py", line 466, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "D:\Python\Python311\Lib\subprocess.py", line 548, in run
          with Popen(*popenargs, **kwargs) as process:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "D:\Python\Python311\Lib\subprocess.py", line 1026, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "D:\Python\Python311\Lib\subprocess.py", line 1538, in _execute_child
          hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      FileNotFoundError: [WinError 2] Das System kann die angegebene Datei nicht finden
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Fixed by installing psutil:

pip install psutil

Problem 2:

Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting tensorrt-libs
  Using cached tensorrt-libs-8.6.1.tar.gz (6.8 kB)
  Preparing metadata (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: 'D:\Python\Python311\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Samhayne\\AppData\\Local\\Temp\\pip-install-tkxmv8xe\\tensorrt-libs_454e03b7bfb443408d87c2e2b512cda7\\setup.py'"'"'; __file__='"'"'C:\\Users\\Samhayne\\AppData\\Local\\Temp\\pip-install-tkxmv8xe\\tensorrt-libs_454e03b7bfb443408d87c2e2b512cda7\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Samhayne\AppData\Local\Temp\pip-pip-egg-info-fuzvb0d_'
       cwd: C:\Users\Samhayne\AppData\Local\Temp\pip-install-tkxmv8xe\tensorrt-libs_454e03b7bfb443408d87c2e2b512cda7\
  Complete output (15 lines):
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Users\Samhayne\AppData\Local\Temp\pip-install-tkxmv8xe\tensorrt-libs_454e03b7bfb443408d87c2e2b512cda7\setup.py", line 137, in <module>
      raise RuntimeError(open("ERROR.txt", "r").read())
  RuntimeError:
  ###########################################################################################
  The package you are trying to install is only a placeholder project on PyPI.org repository.
  This package is hosted on NVIDIA Python Package Index.

  This package can be installed as:
  ```
  $ pip install --extra-index-url https://pypi.nvidia.com tensorrt-libs
  ```
  ###########################################################################################

  ----------------------------------------

Looks like at PyPI there's no Windows version of TensorRT available at the moment. (see: NVIDIA/TensorRT#2933 (comment))
(Couldn't get an older version either from there)

Solution: Install manually from zip as described here:
https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-zip

@Samhayne Can you also tell me which version of Pyton you used, because all of the things I have applied, downloaded the manual zip and whatever, but still not working. I use pyton 3.11.4

To be honest tried two hours to install, absolutly not able to install this.

@Samhayne Can you also tell me which version of Pyton you used, because all of the things I have applied, downloaded the manual zip and whatever, but still not working. I use pyton 3.11.4

Grüß Dich, Jens.
I eventually used 3.10.11 out of compatibility reasons.

I believe the reason was that there was only a wheel for 3.10 (...cp310...) in the releases zip.
(tensorrt-8.5.2.2-cp310-none-win_amd64.whl)

Grüß dich auch Sam,

I think, if I work more on this, I could even bring this to work, but I do not know, if this is worth it.

I actually use FlowFrames 1.40 to interpolate videos, but there is a strange bug, where it slows down after a while. The same did not happen on NCNN-VULKAN Vapoursynth. So i wanted to check, if pytorch is responsible for the slowness. Also NCNN-VULKAN in Flowframes is even 10 Frames faster and seems to scale with greater GPU cards, where the pytorch version does not.

To find out, why this is happening, I was researching and trying to find where actually the error happens and if this is happening on holywu vs-rife and what the performance is.

Can you give me a perhaps a number of how much fps you get on a 720p video with 2x upscaling and if there is a slowdown, if you let it run for about 20 mins or more?

Hey Jens,

I can tell you that I won‘t get a slowdown with 4K videos.
But I‘m only using a 1070 where I‘m now getting 6fps (no trt and with FP16 - FP32 would be too memory hungry for that old card), where with Vulkan-based https://github.com/Asd-g/AviSynthPlus-RIFE, I only got 2fps.
But the quality is also worse. (Becoming only obvious in side-by-side comparison though, and very obvious with fine pattern distortions I can‘t fight here with my limited settings.)

Hello,

It seems that it is round about as fast as the flowframes pytorch rife method.

1280x720 = 921.600
3840x2160 = 8.294.400

I get around 39 frames with pytorch/cuda on a NVIDIA RTX 3060 Laptop with 720p
If we divide 39/9 (which is the factor going down from 4k to 720p) it is 4 fps.
Since the RTX 3060 Laptop should be a little bit than GTX 1070, this should be totally match up with the values you have provided.

So HolyWU/Vs-rife seems to be not faster than flowframes. However I need to still find out why the error happens. Lets research more.

Ok. Here are some numbers for you, @jensdraht1999, for my GTX 1070 @ 720p:

clip = RIFE(clip, model='4.6', scale=1.0, num_streams=2, ensemble=True)
@ RGBH (FP16): 20fps
@ RGBS (FP32): 16fps

clip = RIFE(clip, model='4.6', scale=1.0, num_streams=2)
@ RGBH (FP16): 37fps
@ RGBS (FP32): 30fps

Hello @Samhayne,

Thank you indeed for the number. My numbers with Flowframes:

@ 720p with CUDA/PYTORCH (NVIDIA ONLY) FP32 : 48 FPS
@ 720p with CUDA/PYTORCH (NVIDIA ONLY) FP16: 47 FPS

@ 720p with VULKAN/NCNN/VAPOURSYNTH FP32 :53 FPS
@ 720p with VULKAN/NCNN/VAPOURSYNTH FP16: 52 FPS

I do not even know, if those halfprecision settings apply, it seems not.

The odd thing is, that VULKAN/NCNN implemenation from Vapoursynth is much faster and it does not get slower with time, it takes like 30 min to do the job, cuda/pytorch 2 hours. And there is not even waiting for the images to extract and encode the images to video later, because it's done on the fly.

VULKAN/NCNN/VAPOURSYNTH has been used with 8 threads.

@Samhayne I wanted to use this to test out, but I could not make it work.
This is what I tried:

1.) Install Python 3.10.11
2.) Install Cuda 11.7
3.) Install Cudnn (which is copying the files)
4.) Extract Tensorrt.zip to Desktop Folder.
5.) Get the tensorrt folder to PATH.
6.) Install the tensorrt-8.x.y.z-cp310-none-win_amd64.wheel file via pip
6.) Execute main.py file to download all the flownet.pkls
7.) The I run init.py file and get following error:

Traceback (most recent call last):
File "C:\Users{UserName}\Desktop\OLD\vs-rife-master\vsrife_init_.py", line 14, in
from torch_tensorrt.fx import LowerSetting
File "C:\Users{UserName}\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_tensorrt_init_.py", line 84, in
from torch_tensorrt._compile import * # noqa: F403
File "C:\Users{UserName}\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_tensorrt_compile.py", line 9, in
import torch_tensorrt.ts
ModuleNotFoundError: No module named 'torch_tensorrt.ts'

I do not understand what I am suppossed to do?

Do you have example script, which is working?

@jensdraht1999 Hm. Did you install PyTorch?
PyTorch 1.13

You shouldn't need to run any of the scripts manually to install vsrife.

From the main page:

pip install -U vsrife
...should install vsrife.

python -m vsrife
...should fetch the models.

@Samhayne you mean after step 5 right? I mean for the scripts to execute...

Yes, I “just” installed the mentioned dependencies as described on their project pages and then invoked the two lines to install vsrife and fetch the models.
(…and ran into the problems I mentioned in my first post here)

Ok perfect, then I will try again with pytorch installed. I had installed pytorch 2.0.1, now will try 1.13.

Thank you very much!!!

@Samhayne I think I have solved all installation problems. However, how you are using this exactly? No experience with VS in general.

I have a script called interpolate.py with following content:

import vapoursynth as vs
core = vs.core
from vsrife import rife

clip = video_in
clip = clip.resize.Bicubic(format=vs.RGBS, matrix_in_s="709")
clip = rife(clip,trt=True,factor_num=5,factor_den=1)
clip = clip.resize.Bicubic(format=vs.YUV420P8, matrix_s="709")
clip.set_output()

Then I have start it like this?

vspipe -c y4m "C:\Users{UserName}\Desktop\vsss\interpolate.py" - | "C:\Users\King\Desktop\vsss\ffmpeg.exe" -i - "C:\Users\King\Desktop\vsss\test.mp4"

Hey @jensdraht1999,
I also didn't use VapourSynth before vs-rife.

I'm using StaxRip for my encodings and rewired the dependency paths, so I could use vs-rife,
That way it's embedded in an encoding UI and I also get a nice preview and can add different rife settings as "filters".
It's sort of a challenge by its own though.
Nonetheless...

This is the (partly generated) script I'm using for 4k videos - you just would need to adapt the paths:

import os, sys
import vapoursynth as vs
core = vs.core

sys.path.append(r"E:\StaxRip\StaxRip 2.29.0-x64 (VapourSynth Py 3.11)\Apps\Plugins\VS\Scripts")
core.std.LoadPlugin(r"E:\StaxRip\StaxRip 2.29.0-x64 (VapourSynth Py 3.11)\Apps\Plugins\Dual\L-SMASH-Works\LSMASHSource.dll", altsearchpath=True)
clip = core.lsmas.LibavSMASHSource(r"E:\_rife\video.mp4")
core.std.LoadPlugin(r"P:\_Apps\StaxRip\StaxRip 2.29.0-x64\Apps\Plugins\VS\MiscFilters\MiscFilters.dll", altsearchpath=True)
from vsrife import rife
import torch
#os.environ["CUDA_MODULE_LOADING"] = "LAZY"
clip = vs.core.resize.Bicubic(clip, format=vs.RGBH, matrix_in_s="709")
clip = rife(clip, model='4.6', scale=0.5, num_streams=1, sc=True, sc_threshold=0.12, ensemble=True, trt=True)
clip = vs.core.resize.Bicubic(clip, format=vs.YUV420P8, matrix_s="709")
clip.set_output()

...and (grabbed from the log outputs) under the hood is invoked (depending on the chosen piping) by...

vspipe.exe "script.vpy" - --container y4m | "E:\StaxRip\StaxRip 2.29.0-x64 (VapourSynth Py 3.11)\Apps\Encoders\x265\x265.exe" --y4m --output "E:\_rife\video.hevc" -

-or-

"ffmpeg.exe" -f vapoursynth -i "E:\_rife\script.vpy" -f yuv4mpegpipe -strict -1 -loglevel fatal -hide_banner - | "E:\StaxRip\StaxRip 2.29.0-x64\Apps\Encoders\x265\x265.exe" --y4m --output "E:\_rife\video.hevc" -

-or-

requires DJATOM/x265-aMod or Patman86/x265-Mod-by-Patman:
"E:\StaxRip\StaxRip 2.29.0-x64\Apps\Encoders\x265\x265.exe" --reader-options library=D:\VapourSynth\core\VSScript.dll "E:\_rife\video.hevc" "script.vpy"

I hope this was still helpful.

@Samhayne Thanks. I will have a look and then look what can be done. Thanks again!

@Samhayne

I have followed your instructions and it also works.

For example:
FlowFrames with Autoencode and 3x Interpolation (no trt, torch 2.0.1 compatible made by me, RIFE 4.6) = 77-81 FPS / Duration 23:00 Min

This here:
cd "C:\Users{UserName}\Desktop\vss"
vspipe -c y4m script2.vpy - | "C:\Users{UserName}\Desktop\vss\StaxRip-v2.29.0-x64\Apps\Encoders\x264\x264.exe" --preset ultrafast --demuxer y4m - --output "C:\Users{UserName}\Desktop\vss\testout.mp4" "C:\Users{UserName}\Desktop\vss\test.mp4"

with following modified script:
import os, sys
import vapoursynth as vs
core = vs.core

sys.path.append(r"C:\Users{UserName}\Desktop\vss\StaxRip-v2.29.0-x64\Apps\Plugins\VS\Scripts")
core.std.LoadPlugin(r"C:\Users{UserName}\Desktop\vss\StaxRip-v2.29.0-x64\Apps\Plugins\Dual\L-Smash-Works\LSMASHSource.dll", altsearchpath=True)
clip = core.lsmas.LibavSMASHSource(r"C:\Users{UserName}\Desktop\vss\test.mp4")
core.std.LoadPlugin(r"C:\Users{UserName}\Desktop\vss\StaxRip-v2.29.0-x64\Apps\Plugins\VS\MiscFilters\MiscFilters.dll", altsearchpath=True)
from vsrife import rife
import torch
#os.environ["CUDA_MODULE_LOADING"] = "LAZY"
clip = vs.core.resize.Bicubic(clip, format=vs.RGBH, matrix_in_s="709")
clip = rife(clip, model='4.6', num_streams=12, sc=True, sc_threshold=0.20, ensemble=False, trt=True, factor_num=3)
clip = vs.core.resize.Bicubic(clip, format=vs.YUV420P8, matrix_s="709")
clip.set_output()

This will be 112 Frames. So it will take around 15 Minutes.

The reason why I have set num_streams=12 is, because every stream is taking around 0.42gb, so it will take 5.1 VRAM and it is getting a significant boost at least on 720p video. If the number of streams is set lower, it's slower, around 80 fps.

So yeah, TRT really gives this a boost of 40fps, which in percent is 40% boost. (79 FPS x 1,40 = 110,6 FPS)

BTW RIFE 4.7 is somehow a lot of slower. I am getting around 80FPS with it. The difference is quite big, perhaps a bug? @HolyWu

@jensdraht1999 I didn’t use Rife 4.7 much as it gives me more artifacts than 4.6 (or 4.0 which is even more artifact resistant, but unfortunately no trt support). I don’t encode Animes either.
BUT: According to this post here (https://www.svp-team.com/forum/viewtopic.php?pid=83226#p83226) 4.7 to 4.9 just seem to be more demanding.
They use https://github.com/AmusementClub/vs-mlrt by now as far as I know.
So I doubt it’s a bug in vs-rife.

@Samhayne Yeah, you are probably right, this might be the reason, it's totally bad performing. BTW, here is the full script for interpolating a video with audio:

cd "C:\Users{UserName}\Desktop\vss"
vspipe -c y4m script.vpy - | "C:\Users{UserName}\Desktop\vss\ffmpeg.exe" -i - -y -c:v libx264 -preset fast "C:\Users{UserName}\Desktop\vss\testout.mp4"
"C:\Users{UserName}\Desktop\vss\ffmpeg.exe" -y -i "C:\Users{UserName}\Desktop\vss\test.mp4" -codec:a aac "C:\Users{UserName}\Desktop\vss\testout.aac"
"C:\Users{UserName}\Desktop\vss\ffmpeg.exe" -y -i "C:\Users{UserName}\Desktop\vss\testout.mp4" -i "C:\Users{UserName}\Desktop\vss\testout.aac" -c:v copy -c:a aac "C:\Users{UserName}\Desktop\vss\output.mp4"

Script.vpy:
import os, sys
import vapoursynth as vs
core = vs.core

sys.path.append(r"C:\Users{UserName}\Desktop\vss\StaxRip-v2.29.0-x64\Apps\Plugins\VS\Scripts")
core.std.LoadPlugin(r"C:\Users{UserName}\Desktop\vss\StaxRip-v2.29.0-x64\Apps\Plugins\Dual\L-Smash-Works\LSMASHSource.dll", altsearchpath=True)
clip = core.lsmas.LibavSMASHSource(r"C:\Users{UserName}\Desktop\vss\test.mp4")
core.std.LoadPlugin(r"C:\Users{UserName}\Desktop\vss\StaxRip-v2.29.0-x64\Apps\Plugins\VS\MiscFilters\MiscFilters.dll", altsearchpath=True)
from vsrife import rife
import torch
#os.environ["CUDA_MODULE_LOADING"] = "LAZY"
clip = vs.core.resize.Bicubic(clip, format=vs.RGBH, matrix_in_s="709")
clip = rife(clip, model='4.6', num_streams=12, sc=True, sc_threshold=0.20, ensemble=False, trt=True, factor_num=3)
clip = vs.core.resize.Bicubic(clip, format=vs.YUV420P8, matrix_s="709")
clip.set_output()