precision=0 broken for 4k content

Question

precision=0 broken for 4k content

Selur opened this issue 3 years ago · 14 comments

using v0.8.6.:

ClearAutoloadDirs()
LoadPlugin("I:\Hybrid\64bit\Avisynth\AVISYN~1\LoadDll.dll")
LoadDLL("I:\Hybrid\64bit\Avisynth\avisynthPlugins\d3d9.dll")
LoadPlugin("I:\Hybrid\64bit\Avisynth\AVISYN~1\FFT3dGPU.dll")
BlankClip(length=3000, width=4096, height=2160, fps=25, color=$000000,pixel_type="YV12")
FFT3DGPU(precision=0)
return last

I get:

pattern changes depending on block, modes settings,..

Tested a few 4k videos and different source filters.

Also happens with 0.8.5, 0.8.4, 0.8.3 and old v0.8.2 (x64) from http://avisynth.nl/index.php/FFT3DGPU.

Answer 1 · 2022-05-03T07:22:48.000Z

Hmm. The total size matters. width and height. I've read that DX9 has a maximum texture size 4096x4096 but we are well below that,

BlankClip(length=3000, width=3840-16, height=2048, fps=25, color=$000000,pixel_type="YV12") # good
BlankClip(length=3000, width=3840, height=2048, fps=25, color=$000000,pixel_type="YV12") # bad
BlankClip(length=3000, width=3840-16, height=2048+16, fps=25, color=$000000,pixel_type="YV12") # good
BlankClip(length=3000, width=3840-16, height=2048+24, fps=25, color=$000000,pixel_type="YV12") # bad

Answer 2 · 2022-05-03T14:41:52.000Z

Hi,

I can confirm the problem with 4K content.
I've got mostly partially black frames, but also frames like selur, awful blocking, grid errors or missing colour information, depending on configuration of bw/bh, mode and precision.

I found out the following, when 4K resolution is at least 3840x2160 (I always used plane=4):
mode 0-2: needs precision >= 1, otherwise picture errors
mode 2: bw/bh < 64 always picture errors
mode 0+2: bw/bh 256 needs precision = 2, otherwise picture errors
mode 0+2: bw/bh 512 always picture errors
mode 1: bw/bh 512 needs precision = 2, otherwise picture errors

Precision needs to be increased with growing bw/bh block size, otherwise there are the described artifacts.

But mode 2 seems to have a special problem. To me this looks like if internally always precision=0 is used, when bw/bh < 64.
This is particularly annoying because bw/bh 32 is the default block size.

mode2_bw16_bh16_precision2_3840x2160_8bit (always errors in mode 2, when bw/bh <64, independent from precision)

mode2_bw32_bh32_precision1_3840x2160_8bit (always errors in mode 2, when bw/bh <64, independent from precision)

mode2_bw256_bh256_precision1_3840x2160_8bit (block errors seem to appear only in bright areas)

mode1_bw512_bh512_precision1_3840x2160_8bit (grid errors only occur in dark areas)

Answer 3 · 2022-05-05T08:25:19.000Z

The texture capabilities on my system: 8192x8192. Not even near 4096. It must be enough, the problem comes from something different source.

Answer 4 · 2022-05-05T14:25:11.000Z

Not sure if it helps, but if I use:

LWLibavVideoSource("G:\TESTCL~1\files\MPEG-4~1.264\4k\4K_SAM~1.MP4",cache=false,format="YUV420P16", prefer_hw=0,repeat=true)
FFT3DGPU(precision=0)

instead of

LWLibavVideoSource("G:\TESTCL~1\files\MPEG-4~1.264\4k\4K_SAM~1.MP4",cache=false,format="YUV420P8", prefer_hw=0,repeat=true)
FFT3DGPU(precision=0)

the output seems fine.
-> so maybe the issue is with the 8bit to 16bit conversion

Answer 5 · 2022-05-10T10:43:03.000Z

Since FLOAT16 precision is not enough for 10+ bits (s10e5), formats over 8 bits are using real 32 bit float, regardless of precision parameter.

Answer 6 · 2022-05-10T16:45:43.000Z

okay, so feeding 10+bit content basically means that 32bit is used internally and that leaves us at the start, that precision=0, is broken somehow. gig

Answer 7 · 2022-05-10T18:57:33.000Z

Yep. Today I spent four hours on the topic but no progress. The easy way would be to disable 16 bit half float for mode 1 (overlaps) and force 32 bit but it would not be a noble deed. Everything has a reason in our known world. Now I'd like to know what it is. But it takes time.

Answer 8 · 2022-05-10T19:00:00.000Z

Thanks for looking into it.
I keep my fingers crossed, that you can find the problem.
(also hoping for a Vapoursynth port of the filter, but that is totally different thing ;))

Answer 9 · 2022-05-10T19:07:40.000Z

Off: First world problems. I'd better spend my time on helping Ukrainen refugees :(

Answer 10 · 2022-05-20T16:42:51.000Z

Off: First world problems. I'd better spend my time on helping Ukrainen refugees :(

If you need some motivating words: The main reason why I use fft3dgpu, and why I started to discuss with Selur is that fft3dgpu still is a very performant noise filter (more performant than most newer GPU based filters) which helps to improve compressibility of most videos by 10-35% without damaging video quality much when using sigma values 1.0-2.0.
Instead of using very slow codec settings, which eat up double the amount of time and energy or even more it can be used to reduce file size without loosing much time. The energy consumption of my system is increased by a maximum of 30W when using fft3dgpu with 4K 10bit stuff, which is roundabout 10%, and in most cases computing time is increased by maybe 1-5%.

So almost 40% energy can be saved compared to the usage of slow encoder settings which need 2x more time.
Any efficient tool can help achieve the goal of energy independence faster and fill the wallets of dictators less.

Answer 11 · 2022-05-21T07:03:47.000Z

Yep. The thing is totally weird, why only 16bit half has problems. I'm trying to set parameters to achieve as minimal internal processing, eliminate sharpening, etc. But no luck yet.

Answer 12 · 2023-01-02T00:48:37.000Z

I don't know if this is helpful, but I found out that the grid errors like shown on screenshot "mode1_bw512_bh512_precision1_3840x2160_8bit (grid errors only occur in dark areas)" are gone with mode 1, precision 1, bw+bh=512 when I set the overlap parameters ow=256 oh=256 and wintype=2.
Everything else seems to prduce those grid errors.

Answer 13 · 2023-01-04T21:52:31.000Z

Maybe this is also interesting for you. Giving fft3dgpu 10 bit input fights banding and grid errors:
https://forum.selur.net/thread-3019.html

Answer 14 · 2023-01-05T08:59:51.000Z

Thanks. As I remember from my previous test it turned out that the size alone (e.g. by reaching a hardware or an api limit) cannot be the reason of the phenomenon.