pinterf/FFT3dGPU

precision=0 broken for 4k content

Opened this issue · 14 comments

Selur commented

using v0.8.6.:

ClearAutoloadDirs()
LoadPlugin("I:\Hybrid\64bit\Avisynth\AVISYN~1\LoadDll.dll")
LoadDLL("I:\Hybrid\64bit\Avisynth\avisynthPlugins\d3d9.dll")
LoadPlugin("I:\Hybrid\64bit\Avisynth\AVISYN~1\FFT3dGPU.dll")
BlankClip(length=3000, width=4096, height=2160, fps=25, color=$000000,pixel_type="YV12")
FFT3DGPU(precision=0)
return last

I get:
precision0
pattern changes depending on block, modes settings,..

Tested a few 4k videos and different source filters.

Also happens with 0.8.5, 0.8.4, 0.8.3 and old v0.8.2 (x64) from http://avisynth.nl/index.php/FFT3DGPU.

Hmm. The total size matters. width and height. I've read that DX9 has a maximum texture size 4096x4096 but we are well below that,

BlankClip(length=3000, width=3840-16, height=2048, fps=25, color=$000000,pixel_type="YV12") # good
BlankClip(length=3000, width=3840, height=2048, fps=25, color=$000000,pixel_type="YV12") # bad
BlankClip(length=3000, width=3840-16, height=2048+16, fps=25, color=$000000,pixel_type="YV12") # good
BlankClip(length=3000, width=3840-16, height=2048+24, fps=25, color=$000000,pixel_type="YV12") # bad

Hi,

I can confirm the problem with 4K content.
I've got mostly partially black frames, but also frames like selur, awful blocking, grid errors or missing colour information, depending on configuration of bw/bh, mode and precision.

I found out the following, when 4K resolution is at least 3840x2160 (I always used plane=4):
mode 0-2: needs precision >= 1, otherwise picture errors
mode 2: bw/bh < 64 always picture errors
mode 0+2: bw/bh 256 needs precision = 2, otherwise picture errors
mode 0+2: bw/bh 512 always picture errors
mode 1: bw/bh 512 needs precision = 2, otherwise picture errors

Precision needs to be increased with growing bw/bh block size, otherwise there are the described artifacts.

But mode 2 seems to have a special problem. To me this looks like if internally always precision=0 is used, when bw/bh < 64.
This is particularly annoying because bw/bh 32 is the default block size.

mode2_bw16_bh16_precision2_3840x2160_8bit (always errors in mode 2, when bw/bh <64, independent from precision)
mode2_bw16_bh16_precision2_3840x2160_8bit

mode2_bw32_bh32_precision1_3840x2160_8bit (always errors in mode 2, when bw/bh <64, independent from precision)
mode2_bw32_bh32_precision1_3840x2160_8bit

mode2_bw256_bh256_precision1_3840x2160_8bit (block errors seem to appear only in bright areas)
mode2_bw256_bh256_precision1_3840x2160_8bit

mode1_bw512_bh512_precision1_3840x2160_8bit (grid errors only occur in dark areas)
mode1_bw512_bh512_precision1_3840x2160_8bit

The texture capabilities on my system: 8192x8192. Not even near 4096. It must be enough, the problem comes from something different source.

Selur commented

Not sure if it helps, but if I use:

LWLibavVideoSource("G:\TESTCL~1\files\MPEG-4~1.264\4k\4K_SAM~1.MP4",cache=false,format="YUV420P16", prefer_hw=0,repeat=true)
FFT3DGPU(precision=0)

instead of

LWLibavVideoSource("G:\TESTCL~1\files\MPEG-4~1.264\4k\4K_SAM~1.MP4",cache=false,format="YUV420P8", prefer_hw=0,repeat=true)
FFT3DGPU(precision=0)

the output seems fine.
-> so maybe the issue is with the 8bit to 16bit conversion

Since FLOAT16 precision is not enough for 10+ bits (s10e5), formats over 8 bits are using real 32 bit float, regardless of precision parameter.

Selur commented

okay, so feeding 10+bit content basically means that 32bit is used internally and that leaves us at the start, that precision=0, is broken somehow. gig

Yep. Today I spent four hours on the topic but no progress. The easy way would be to disable 16 bit half float for mode 1 (overlaps) and force 32 bit but it would not be a noble deed. Everything has a reason in our known world. Now I'd like to know what it is. But it takes time.

Selur commented

Thanks for looking into it.
I keep my fingers crossed, that you can find the problem.
(also hoping for a Vapoursynth port of the filter, but that is totally different thing ;))

Off: First world problems. I'd better spend my time on helping Ukrainen refugees :(

Off: First world problems. I'd better spend my time on helping Ukrainen refugees :(

If you need some motivating words: The main reason why I use fft3dgpu, and why I started to discuss with Selur is that fft3dgpu still is a very performant noise filter (more performant than most newer GPU based filters) which helps to improve compressibility of most videos by 10-35% without damaging video quality much when using sigma values 1.0-2.0.
Instead of using very slow codec settings, which eat up double the amount of time and energy or even more it can be used to reduce file size without loosing much time. The energy consumption of my system is increased by a maximum of 30W when using fft3dgpu with 4K 10bit stuff, which is roundabout 10%, and in most cases computing time is increased by maybe 1-5%.

So almost 40% energy can be saved compared to the usage of slow encoder settings which need 2x more time.
Any efficient tool can help achieve the goal of energy independence faster and fill the wallets of dictators less.

Yep. The thing is totally weird, why only 16bit half has problems. I'm trying to set parameters to achieve as minimal internal processing, eliminate sharpening, etc. But no luck yet.

I don't know if this is helpful, but I found out that the grid errors like shown on screenshot "mode1_bw512_bh512_precision1_3840x2160_8bit (grid errors only occur in dark areas)" are gone with mode 1, precision 1, bw+bh=512 when I set the overlap parameters ow=256 oh=256 and wintype=2.
Everything else seems to prduce those grid errors.

Maybe this is also interesting for you. Giving fft3dgpu 10 bit input fights banding and grid errors:
https://forum.selur.net/thread-3019.html

Thanks. As I remember from my previous test it turned out that the size alone (e.g. by reaching a hardware or an api limit) cannot be the reason of the phenomenon.