Wildly increasing memory consumption - video cache auto-tune goes mad at specific access patterns

Question

Wildly increasing memory consumption - video cache auto-tune goes mad at specific access patterns

pinterf opened this issue 10 months ago · 23 comments

As nicely reported on doom9 https://forum.doom9.org/showthread.php?p=1995403#post1995403.

Script:

ColorBarsHD().KillAudio()
Spline36Resize(3840, 2160)
Spline36Resize(1920, 1080)
Spline36Resize(3840, 2160)
Spline36Resize(1920, 1080)
Spline36Resize(3840, 2160)
Spline36Resize(1920, 1080)
Spline36Resize(3840, 2160)
Spline36Resize(1920, 1080)
Prefetch(4)

Open Avspmod (MPC-HC is good as well) (you can open Task Manager process/memory page)
Press play and let the video play for a bit (~20-25 frames)
Press pause
Framestep backwards

We can notice a sudden increase in memory consumption at the ~10th backstep and for each following backstep.

(The many occurances of Spline36Resize just help us to exaggarate the effect)

The problem is probably similar to Issue #270 where a specific access pattern like 0, 0, 0, 1, 1, 2, 1, 3, 2, 4, 2, 5, 3, 6, 3, 7, 4, 8, causes similar effect, see #270 (comment)

In this issue the access pattern is 1-2-3-4-5-6-...24-25-26- 25-24-23-22-21-20-19...7-6-5

Answer 1 · 2023-12-26T08:16:37.000Z

As a workaround you can use these lines at the beginning of the script.

#SetCacheMode(0) #  Run until frame 40, then step back 10 times in avspmod, 11th and on back step increases 200MB cache space 
SetCacheMode(1) #no problem
.. script follows

Answer 2 · 2024-02-17T12:23:28.000Z

Hi, just wondering if there had been any progress on this issue? Are you still confident it's fixable or is it more of a "basket case" problem?

I use QTGMC with multithreading quite a lot, mainly for realtime DVD viewing as it cleans up the image so nicely, and that loads up the CPU on seek, which in turn exacerbates the issue. SetCacheMode(1) is completely incompatible with seeking on my systems so I can't use that.

Answer 3 · 2024-02-19T09:52:23.000Z

No real progress, I'm just trying to understand how the so called ghost cache entries work, and put debugging and logging helper code here and there. Even if I were to deal with this daily, it would still take several weeks to complete, I guess. Nevertheless the issue is a challenge, I think it's fixable.

Answer 4 · 2024-02-19T15:24:35.000Z

I'm just trying to understand how the so called ghost cache entries work [...]

Maybe I can help with that. The ghost entries are what allow the cache to be adaptive. The basic idea is that a ghost entry is somewhat like a normal cache entry except without the actual data (the frame), and they stay in the cache a little bit longer. Ghost entries are cheap memory-wise as they take up almost no space.

When a frame is requested and it is not in the cache anymore but its ghost still is, it means we have recently used that frame but it didn't live long enough in the cache. So next time we make sure that it stays alive longer before being evicted from the cache. This way, a frame whose ghost is never requested stays in the cache only for a short time (which avoids unnecessary memory consumption), but a frame with many requests to its ghosts stays in the cache progressively longer and longer, until its lifetime doesn't need to be extended anymore.

At least that was the original idea years ago. Once you get the idea it is pretty simple actually. The complex part of the cache is dealing with all this in a thread-safe way.

Answer 5 · 2024-02-20T05:09:47.000Z

I was thinking maybe it's possible to do a bodge solution in the meantime, like simply detecting when the auto-tune went mad on seek and resetting the process's memory usage, which goes something like this in the Windows API...

# get handle to the process running Avisynth.dll
handleToProcess = GetCurrentProcess()

# remove as many pages as possible from its working set memory
SetProcessWorkingSetSize(handleToProcess, -1, -1)

# delete the handle
CloseHandle(handleToProcess)

https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-setprocessworkingsetsize

This is obviously "bad practice" as a long term default solution, but as a short term nondefault option it might be preferable to hitting the SetMemoryMax() size and getting slowdowns (I'm currently compensating for the slowdown by giving QTGMC an extra thread or two, it works okay but sometimes crashes when I alt-tab with maxed out memory usage)

edit: if I recall correctly you did something with SoxFilter 2.2 to make it reinitialise on seek to make it compatible with realtime seeking, so maybe the memory reintialisation could be done on seek only, and only if the issue occured, so that SetProcessWorkingSetSize() would only rarely be called on

Answer 6 · 2024-02-21T08:30:26.000Z

Hi pylorak, thanks for the explanation.

The problem is that the size of the main cache is always incremented by one in specific scenarios, since the item is found among the ghost entries, the value of "ghosted" in this case is always 1. (which is >0)

https://github.com/AviSynth/AviSynthPlus/blob/master/avs_core/core/LruCache.h#L232

Answer 7 · 2024-02-21T16:27:10.000Z

What happens is that when old frames are already ghosted but not yet removed from the ghost entries, the user begins to backstep in the video and thus the cache hits the ghost entries again, thereby causing the cache to grow.

I think the root cause of the problem here is that the cache does not know that the video step direction has changed.
From the cache's point of view, hitting a ghost entry because it is the filter's regular access pattern, or hitting it because the user re-requested an earlier frame looks exactly the same ("earlier frame" here means not a frame with a lower frame number, but a frame that the user has already viewed recently - the problem is not going backwards, the problem is changing the direction),

My proposed solution is to clear the ghost list of all caches whenever the user changes step direction.

Answer 8 · 2024-03-22T04:14:56.000Z

In the meantime is it possible to give us an Avisynth internal function which we can call inside our scripts to manually clear the ghost entries in the cache? Then maybe I could call it when the user seeks (detecting the seek inside ScriptClip, so I would need to be able to call it inside a ScriptClip).

I have tried outputting BlankClip() for a few seconds on seek to try and unload the CPU and it seems to somewhat reduce the chance of getting a cache frenzy when seeking +/- 10 seconds, but doesn't help with the 1 frame backwards seeks. Doing a +/- 10 second seek is common during realtime screening so it's better than nothing.

Answer 9 · 2024-03-22T09:31:53.000Z

Actually I don't think that would work reliably because current_frame inside ScriptClip is often not in sync with what Avisynth is processing internally. Only Avisynth would know for sure when the frame order changed due to user seek. That's probably why my BlankClip() workaround only works some of the time.

Answer 10 · 2024-03-25T12:23:16.000Z

Meanwhile I did some tests but could not get a real achievement on the topic, but put some extra logging (frame requests, internal pattern direction recognition) in Avisynth. It turned out that AvsPMod frame requests are a bit weird - don't know the reason -, it seems that frames are requested multiple times when doing single stepping one by one.

E.g. this pattern (manual steps): 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 6, 6 (I then jump forward a bit and reversed the direction), 55, 55, 55, 54, 54, 54, 53, 53, 53.

Anyway, I'd expect a single step - single frame request pattern. If this pattern is confusing Avisynth's internal "pattern lock" prediction or not, don't know yet. My progress was stopped here two weeks ago, could not continue the debugging since then.
Also did some experimental hacks on clearing the ghosts, but it relies on recognizing the change of the pattern (frame request orders) direction.

Answer 11 · 2024-03-26T04:45:24.000Z

Also did some experimental hacks on clearing the ghosts, but it relies on recognizing the change of the pattern (frame request orders) direction.

Yes, I think that's what pylorak is suggesting too, and is what I was trying to do inside ScriptClip with something like:

if ( current_frame > previous_frame + seek_thresh
\ || current_frame < previous_frame - seek_thresh ){ 
     return BlankClip()  # in lieu of clearing cache ghosts 
}

But current_frame is not accurate so it doesn't work reliably. I reckon if current_frame was accurate then it may work, but then again I don't know how Avisynth works internally whether that would muck other things up. I'm guessing it would probably at least make a huge delay when seeking which may not be good either.

As this issue only affects seeking which is only a concern when using Avisynth for realtime live playback, maybe it's worth having a third cache mode the user can select like

0 = CACHE_FAST_START (default)
1 = CACHE_OPTIMAL_SIZE
2 = CACHE_REALTIME ?

Answer 12 · 2024-03-26T10:27:20.000Z

Frame order prediction does not work per-clip, it serves the prefetch mechanism (steps and proper direction) and acts at the very origin of the frame requests.

Answer 13 · 2024-04-11T03:06:44.000Z

There's already this function Preroll which "works by detecting any out of order access in the audio or video track, and seeking the specified amount earlier in the stream and then taking a contiguous run up to the desired frame". Maybe a solution could be implemented in there?

I currently use Preroll on my ScriptClips as it seems to help them process frames in linear order (helps keep current_frame==previous_frame+1 inside the ScriptClip body).

Answer 14 · 2024-04-12T18:36:43.000Z

Hello pinterf,
AvsPmod requests the current frame exactly 2 times, once for the source clip and once for the display clip.
The display clip is derived from the source clip with 'Eval'.
1.) there is no other way (Display, Pixel Value, DisplayFilter etc.)
2.) It has always been like this.
3.) It makes almost no difference to the speed (tested by myself).

The Prefetch(1,1) that you noticed is an option and can be switched off under Video > Display > 'Prefetch RGB Display conversion'.

What I have forgotten:
If the D3D window is also used for the display, then there can also be 3 frame calls. The D3D window uses its own YUV420P8 clip.

Answer 15 · 2024-04-13T05:32:03.000Z

Thank you for the clarification, I just didn't understand why there are multiple calls instead of a one-by-one plus or minus pattern. Of course on my side, inside a Prefetch object they are consistent, but now it's easier to debug it if I watch only one of them.