libsdl-org/SDL

Windows: main thread is blocked when user resizes or moves a window

SDLBugzilla opened this issue ยท 63 comments

This bug report was migrated from our old Bugzilla tracker.

These attachments are available in the static archive:

Reported in version: HG 2.1
Reported for operating system, platform: Windows (All), All

Comments on the original bug report:

On 2013-08-30 01:00:19 +0000, wrote:

When user clicks on window title or border, system generates WM_NCLBUTTONDOWN message. When DispatchMessage receives this message, it handles window resizing or moving and doesn't return until user releases mouse. It also sends WM_WINDOWPOSCHANGING to window proc. When using WinAPI directly there is no big problem that DispatchMessage blocks, because it is possible to handle WM_WINDOWPOSCHANGING message (or use WM_TIMER) to do actions that must be performed regularly. But on SDL it seems to be impossible to do anything in main thread while user moves or resizes window.

On 2014-01-20 05:04:25 +0000, Nathaniel Fries wrote:

I actually spent my weekend fixing this.

I'm not sure when I'll have time again to work on something, but I did upload my code for it, so someone should be able to whip up a patch fairly easily.

For an SDL-specific patch, I wouldn't bother with using a thread-local (SDL doesn't enable the creation of new GUI threads), sending WM_SIZING, or with MINMAXINFO at all.

It even includes (untested) code for child windows, so it should hopefully work in cases where SDL is used from a widget provided by Qt or some other toolkit.

it's this sourceforge project here: https://sourceforge.net/projects/win32loopl/

On 2014-01-20 16:48:16 +0000, Nathaniel Fries wrote:

*** Bug 2316 has been marked as a duplicate of this bug. ***

On 2014-01-20 16:54:40 +0000, Nathaniel Fries wrote:

Just a heads up, the above fix for this probably shouldn't be default behavior because it can cause resizing and moving to become choppy to the user if rendering or other main loop code takes too long. It could cause new bug reports from developers of pre-existing SDL2 applications who are simply passing on bug reports from users who updated their SDL2 dll. I'd recommend it as a feature that can be turned on or off by the programmer, and defaults to off.

On 2014-02-06 05:21:48 +0000, Nathaniel Fries wrote:

Created attachment 1549
patch

Finally found time to get around to making a proper patch. Code is mostly the same as I wrote before, but adapted for what SDL looks like internally. Doesn't make modeless behavior optional, though.

On 2014-02-09 10:08:44 +0000, Sam Lantinga wrote:

Thanks! We'll take a look at this after the 2.0.2 release.
This also potentially fixes issues with dragging the titlebar when the cursor is grabbed?

On 2014-02-09 12:43:34 +0000, Nathaniel Fries wrote:

"This also potentially fixes issues with dragging the titlebar when the cursor is grabbed?"

Not sure what you mean by this. This code has to acquire mouse focus in order to receive all necessary mouse movements.

On 2014-02-09 20:36:01 +0000, Sam Lantinga wrote:

Yes, but we're in control of the movement process so we can account for our own grab state. It's not a fix, it just makes it possible to fix. :)

On 2014-02-09 23:31:57 +0000, Nathaniel Fries wrote:

I suppose that if mouse focus is lost, we should take it back. Might be a possible bug in this code. MSDN says not to call SetCapture when processing WM_CAPTURECHANGE, so I can see how this could have been a difficult issue previously.
Now, of course, we can add a simple fix in SDL_PumpEvents or elsewhere, but we'd still have a chance of losing some events that way. A better fix would simply be to use the cursor pos attached to a windows message (__tagMSG::pt), compare it to the capture position, and work from there instead of WM_MOUSEMOVE. Then we won't even need to worry about mouse capture. Just a thought and haven't had a chance to test it, though.

Also, there was a (quite obvious once I noticed it) bug in my last patch. This is what I get for not testing thoroughly. When handling WM_MOUSEMOVE, I use lParam instead of the result from GetMessagePos. lParam is relative to the client area, which means it can be negative; GetMessagePos is in screen coordinates. This is what you get for not taking your time. :)
here's a hand-written patch for my patch to correct this:

             if(data->in_modeless_resize)
             {
                 POINT ptPos;
+                DWORD dwPos = GetMessagePos();
-                ptPos.x = GET_X_LPARAM(lParam);
-                ptPos.y = GET_Y_LPARAM(lParam);
+                ptPos.x = GET_X_LPARAM(dwPos);
+                ptPos.y = GET_Y_LPARAM(dwPos);
                 WIN_DoResize(hwnd, data, ptPos, SDL_FALSE);
             }

On 2014-02-20 21:05:23 +0000, Nathaniel Fries wrote:

Created attachment 1568
better patch

Attached is a much better patch. I wasn't sure whether the values returned by SDL_GetWindow[Min/Max]imumSize were client size or window size, so that may need to be corrected inside WIN_DoResize.

Still doesn't add a new window flag for modeless behavior.

When mouse capture is lost, the modeless resize/movement operation is finalized. This is because:

  1. Attempting to reclaim mouse capture in handling WM_CAPTURECHANGED actually crashes the program.
    Without mouse capture, we won't get messages for mouse movement outside the current boundaries of the window.
  2. MSDN states that only the Foreground Window can capture the mouse, and presumably there's a good reason for an app to claim Foreground Window status (either in response to user input, or an application had to alert the user of something).
    The user will probably interact with this foreground window, even if just to remove its foreground window status, so it seems silly for SDL to continue acting as if the user is interactively resizing a Window.

On 2014-02-25 12:53:59 +0000, Sam Lantinga wrote:

Reviewing the code, it looks pretty good. I'm looking forward to trying it out after 2.0.2 is released.

The values returned by SDL_GetWindow[Min/Max]imumSize are client size.

On 2014-03-05 11:12:47 +0000, Andreas Ertelt wrote:

There's one tiny issue I experience with this code - in multi-monitor setups the window always jumps back to the primary monitor when being picked up.

Also, since this patch was submitted, handling for WM_NCLBUTTONDOWN was added - the current code would have to be added to the default-case of this patch and the return statement should be removed.

On 2014-03-06 10:06:56 +0000, Andreas Ertelt wrote:

Another minor issue is that I am receiving SDL_MOUSEMOTION events when moving or resizing the window in a way that the mouse cursor temporarily hovers the window.
I also receive two of those events when just clicking and holding the window on the title or border as well as another when releasing.

On 2014-03-09 01:08:36 +0000, Nathaniel Fries wrote:

"There's one tiny issue I experience with this code - in multi-monitor setups the window always jumps back to the primary monitor when being picked up."
something, isn't it?
I don't know what would cause this and I don't have a multi-monitor setup to test on, so I'm afraid I'll have to leave that fix to someone else.

I've identified the likely cause of that minor issue (double SDL_MOUSEMOTION events). When WIN_DoResize is called in response to WM_MOUSEMOVE, the last argument should be SDL_FALSE instead of SDL_TRUE (SDL_TRUE indicates that it should "force" the cursor position to a correct value after resizing).

On 2014-03-11 07:11:10 +0000, Andreas Ertelt wrote:

Didn't have much time to test this, but the change you suggested caused the application to crash (quite literally).

The multi monitor issue is related to the GetSystemMetrics() call which is fed with
SM_CXSCREEN/SM_CYSCREEN which limits the routine to the primary monitor. Instead you would have to use MonitorFromRect() to find the current/nearest monitor (using mouse coordinates and MONITOR_DEFAULTTONEAREST) and then retreive its size/coordinates using GetMonitorInfo().
Alternatively using SM_CXVIRTUALSCREEN/SM_CYVIRTUALSCREEN would be a quick fix, but that would make that part a bit pointless for setups where monitors don't use the same resolutions and/or aren't properly aligned.

Another sideeffect I found is that the aero features snap and shake stop working. Not quite sure how to emulate those correctly (especially since snap delivers visual feedback as well).

[HKEY_CURRENT_USER\Software\Policies\Microsoft\Windows\Explorer] "NoWindowMinimizingShortcuts" defines the state of shake.

[HKEY_CURRENT_USER\Control Panel\Desktop] "WindowArrangementActive" defines the state of snap.

On 2014-03-12 19:09:40 +0000, Nathaniel Fries wrote:

Actually MSDN makes it seem like the default maximum window tracking dimension is GetSystemMetrics(SM_C[X/Y]MAXTRACK) regardless of the monitor the window is on.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms724385%28v=vs.85%29.aspx

I never knew of the shake and snap features. I've played around with them briefly, and shake appears to be relatively simple to implement using documented User32 calls (however, because I'm using documented User32 calls to change window position, they require a full redraw which appears to take longer than whatever User32 does internally - this may make the shake feature appear laggy as well as require we take some liberty with the timing). Snap would require I somehow cover the entire screen with a blue highlight, and I'm not sure how to do that without creating a window the size of the desktop (and we return to mouse focus issues).

On 2014-03-12 20:40:33 +0000, Nathaniel Fries wrote:

Actually, by playing around I think I've found a relatively simple way to highlight the entire screen, but it will require different code for windows on extra monitors so I can only guarantee something that would work on single-monitor systems. Might be awhile before I can materialize a complete fix, though.

On 2014-03-16 09:42:03 +0000, Nathaniel Fries wrote:

Believe it or not, I'm finding it harder to get shake just right than snap. I have basically functional versions of both in my little project on sourceforge now.

I won't be making another patch for SDL until I've got all the little quirks worked out though. Might be some time.

On 2019-12-07 17:00:27 +0000, Jake Del Mastro wrote:

Has there been any progress on this bug? I'm noticing this still seems to be an issue in SDL 2.0.10

On 2020-03-24 21:13:36 +0000, Ryan C. Gordon wrote:

(In reply to Jake Del Mastro from comment # 18)

Has there been any progress on this bug? I'm noticing this still seems to be
an issue in SDL 2.0.10

Reading through all these comments, is this something we really want? It sounds like something that we're going to have to maintain every time Microsoft adds/changes a UI mechanic, and never get quite right, and introduce a bunch of risky behaviors, just to be more responsive when someone drags the window.

I'd be inclined to mark this WONTFIX, but I'll let Sam make that decision if he wants.

--ryan.

On 2020-04-16 16:50:42 +0000, Ron Aaron wrote:

It's not just an issue on Windows. macOS has the same problem (don't know if it's for a similar reason)

On 2020-04-16 19:19:48 +0000, Andreas Ertelt wrote:

Ryan is correct, the way this patch approaches the issue would require changes over time to stay consistent with Windows behavior and there are too many corner cases to consider.

But a problem should definitely not be marked WONTFIX just because a suggested solution is inadequate.

While I'm fairly sure there is no feasible solution that fully fixes the issue as it was reported here, the likely prime issue most people are concerned with is not being able to perform drawing operations / simulation anymore.

This could be addressed on Windows by allowing developers to register a callback (per window) to be performed on its WM_SIZING(!), WM_PAINT and likely also WM_ERASEBACKGROUND events. If this feature is used, the message loop would also have to call InvalidateRect on the window whenever no more messages are in the queue and upon completion of the callback a ValidateRect on the window would have to be issued (this is to make sure WM_PAINT events keep getting issued when nothing else is happening).

I'm confident most other platforms could be handled in a similar fashion.

This approach wouldn't affect existing programs in any way and provide developers who care about not being interrupted for an unreasonable amount of time with the means to address the issue with minimal changes and without having to hijack the window's message handler.

On 2020-04-18 13:08:58 +0000, Andreas Ertelt wrote:

I just checked my engine code and there are three more corner cases to be considered on Windows that I didn't think of anymore.

One is system/context menus, the other when a modular window is opened (eg. message box) and the last is picking the window up without moving it (can also be the case when moving isn't configured to redraw the window in Window's performance options).

I worked around all of this by starting a timer on the window that triggers the redraws. This timer is started under the following conditions:

  1. When WM_SYSCOMMAND is called with the (wparam & 0xfff0) == SC_MOVE (this also happens when the the regular window menu is opened).
  2. it must also be started when WM_ENTERMENULOOP is received to stop context menus from interrupting the program.
  3. The WM_ENABLE message is received with a wparam of 0.

The only slight annoyance I could notice at this point is when you hold down the caption bar with the mouse, it takes a second to actually call the first timer-event. This can be slightly alleviated by allowing WM_GETICON to trigger a draw while the timer is active. The WM_GETICON-behavior has likely been introduced with Vista - I currently have no older machine to verify this on.

This redraw timer can then be deleted on the next proper WM_PAINT message received while the window is active again (WM_ENABLE).

In my program I trigger this timer at the refresh rate and make sure there is no more message like it in the queue before issuing the draw call (to avoid clogging the message queue).

I can't think of an alternative to using a timer here, being that the control over the message loop is being temporarily diverted and the only event being reliably triggered being WM_GETICOn at a 1Hz frequency. At least I couldn't find any other way to introduce events under these conditions.

On 2020-07-12 09:53:51 +0000, Jack C wrote:

Any updates to this bug? I like Andreas Ertelt's idea of introducing optional callbacks for those events. I know Blender's approach to drawing while resizing the window is handled in WM_SIZE/WM_SIZING event. There is an event dispatch call under "case WM_SIZE:" that will lead to a draw call.

You can find the code I am referring to here.

https://github.com/blender/blender/blob/404486e66c6a4ebebb085700d58b396597146add/intern/ghost/intern/GHOST_SystemWin32.cpp#L1659

Fix it already its been 8 years!

Making the executive decision to close this bug as wontfix; this isn't worth all the known problems and unknown risks that fixing it would cause.

So the fundamental issue is that the way SDL gives you events is fundamentally at odds with how Win32 wants your program to handle events. What you're supposed to do (according to Win32) is have a "window procedure" (callback) which runs for each event. SDL provides this callback for you, but the SDL callback just records events into the event queue for you to respond to later.

One of the events that you're supposed to respond to during your callback is is a repaint event. SDL can't repaint for you but usually this isn't an issue because sometime shortly after SDL puts all the events in the queue then you grab events from the queue and repaint yourself.

The problem is that while the user is holding the mouse button down during a resize of the window, control never returns to the main program. The user32 events loop will just hold on to your program's control flow and continually call your window procedure, giving you resize related events and paint events.

If you respond to the paint events by painting immediately within the window procedure then you'll get a program that behaves "properly" during a resizing. However, this runs totally counter to SDL's event queue system.

The only way to fix this is to entirely replace one of SDL's core components.

In other words, the wontfix assessment is fair.

How is there risk to making it so you can drag a window without it pausing a program?

This thread lists multiple problems and potential future incompatibilities.

However, you're welcome to use the attached patch in your code, if you're comfortable with the drawbacks.

There needs to be some kind of SDL hint, or something along those lines to fix this behavior, because this is making SDL2 borderline unusable for games that have lockstep netcode. (one person decides to drag or resize their window, and the whole session dies, pissing off all of the players trying to play. YAY!)

One shouldn't need to rely on a patch that is old and insanely hard to find (I've been searching for such a thing for weeks, only found this now.) just to get past such an obvious and horrid issue. Not to mention said patch likely can't even be merged with current SDL2 anymore due to its age.

Been struggling with this problem for years over multiple projects, and I'm tired, frustrated, and fucking desperate for something, anything, that can remedy it.

one person decides to drag or resize their window, and the whole session dies, pissing off all of the players trying to play.

What happens to the other players when someone unplugs their network cable in this scenario?

@icculus Unplugging your network cable as the host of a multiplayer game would indeed disconnect everyone else playing the game (If you're not the host, as with a client/server model, it would at minimum disconnect yourself). But that's completely expected by the user who unplugged their network cable, both the client/server and the p2p results of that action are well-understood by the user and by the game dev, and as game devs we can add a message like "The host has disconnected" which the users would be able to figure out in a crystal clear manner that because Robert unplugged his network cable, and Robert was (presumably) the host in the p2p game, everyone got disconnected. E.g., No bug tickets for us the dev team, because the users fully understood exactly what happened.

Having everyone (or even just yourself) disconnect just because you dragged a window is very subtle and frustrating, and it would be difficult for the user to even realize that it was the dragging that caused the issue, as opposed to just thinking your application is sucky. It took me as the dev countless hours of debugging to realize that the reason why my application client was disconnecting from the server every once and a while was because I was dragging the window, dragging the window just isn't something that I interpret as an action that could affect my application, it's just a subconscious thing I do to ensure that things are placed well. Additionally, for me, dragging the window only disconnected client from server like 1/4th of the time which makes it even harder to make that association, it just looked like a completely random bug that we couldn't figure out how to reproduce consistently for the longest time, but made the application somewhat annoying to use for long periods of time, and its not like our users ever reported that they were dragging the window when it happened, they had no idea how to replicate it either, it just happened randomly from their point of view. Once we figured out the association it wasn't hard to find this github issue, but something better can be done here.

vvvvvvvvvvvvvvvvvvvvvvv

Imo, at the absolute minimum, the documentation of SDL_PollEvent desperately needs to say that it will block if the user drags or resizes the window on the Windows OS. Then at least developers can work around the issue and maintain network connections on another thread without it being an unnecessarily large refactor after the fact [as it was for us].

^^^^^^^^^^^^^^^^^^^^

^ This. Very much this.

One player (not even the host) was dragging their window in a match I had and everyone was confused as to why everyone was suddenly lagging. (To be specific, I'm working on a netplay-centric port of Duke Nukem 3D, which uses a master/slave lockstep form of networking, so if ANYONE so much as sneezes on their window, it'll hang the whole match until the operation is done, and add a bunch of persistent lag over the next minute or two as the input lag buffer gets inflated to hell and back to compensate.)

Wouldn't be the first time this has happened, either. Adding to my frustration and abrasive demeanour right now is getting blamed for it and/or being told my port sucks because of something out of my control.

What happens to the other players when someone unplugs their network cable in this scenario?

The entire game hangs for everyone, and they have to quit. Doesn't matter if it's the host or a client. (The unfortunate downside to lockstep netcode)

Would it be possible to set a custom WindowProc function on the window that receives the WM_MOVE, etc. and handles whatever updates need to be done application side before passing the events off to SDL's WindowProc?

is getting blamed for it and/or being told my port sucks because of something out of my control.

I wrote one of the first UDP implementations for Duke3D back in the day, so I totally get this. But the fragility of Duke's system is going to bite you sooner or later, window dragging or not. The extremely non-trivial but correct approach would be to replace that netcode with something more robust...but dear lord, that would be a painful effort.

Some other approaches to try:

  • Abuse the hit test API:

    /**
     * Callback used for hit-testing.
     *
     * \param win the SDL_Window where hit-testing was set on
     * \param area an SDL_Point which should be hit-tested
     * \param data what was passed as `callback_data` to SDL_SetWindowHitTest()
     * \return an SDL_HitTestResult value.
     *
     * \sa SDL_SetWindowHitTest
     */
    typedef SDL_HitTestResult (SDLCALL *SDL_HitTest)(SDL_Window *win,
                                                     const SDL_Point *area,
                                                     void *data);
    
    /**
     * Provide a callback that decides if a window region has special properties.
     *
     * Normally windows are dragged and resized by decorations provided by the
     * system window manager (a title bar, borders, etc), but for some apps, it
     * makes sense to drag them from somewhere else inside the window itself; for
     * example, one might have a borderless window that wants to be draggable from
     * any part, or simulate its own title bar, etc.
     *
     * This function lets the app provide a callback that designates pieces of a
     * given window as special. This callback is run during event processing if we
     * need to tell the OS to treat a region of the window specially; the use of
     * this callback is known as "hit testing."
     *
     * Mouse input may not be delivered to your application if it is within a
     * special area; the OS will often apply that input to moving the window or
     * resizing the window and not deliver it to the application.
     *
     * Specifying NULL for a callback disables hit-testing. Hit-testing is
     * disabled by default.
     *
     * Platforms that don't support this functionality will return -1
     * unconditionally, even if you're attempting to disable hit-testing.
     *
     * Your callback may fire at any time, and its firing does not indicate any
     * specific behavior (for example, on Windows, this certainly might fire when
     * the OS is deciding whether to drag your window, but it fires for lots of
     * other reasons, too, some unrelated to anything you probably care about _and
     * when the mouse isn't actually at the location it is testing_). Since this
     * can fire at any time, you should try to keep your callback efficient,
     * devoid of allocations, etc.
     *
     * \param window the window to set hit-testing on
     * \param callback the function to call when doing a hit-test
     * \param callback_data an app-defined void pointer passed to **callback**
     * \returns 0 on success or -1 on error (including unsupported); call
     *          SDL_GetError() for more information.
     *
     * \since This function is available since SDL 2.0.4.
     */
    extern DECLSPEC int SDLCALL SDL_SetWindowHitTest(SDL_Window * window,
                                                     SDL_HitTest callback,
                                                     void *callback_data);

    ...which will call a function that you specify constantly while the mouse is dragging; it's meant to be used to say "treat this coordinate as part of the title bar, etc" so you can do things like draw a window from the middle, but you could also use it to update state, send a non-blocking packet if it's time to do so, etc, as long as you do it fast in general and return right away if it's not time to do anything yet. This would avoid adding any windows-specific code to your app. This would be SDL_SetWindowHitTest(), and your callback would just always return SDL_HITTEST_NORMAL.

  • If you don't mind poking at win32, you can try SDL_WindowsMessageHook:

    typedef void (SDLCALL * SDL_WindowsMessageHook)(void *userdata, void *hWnd, unsigned int message, Uint64 wParam, Sint64 lParam);
    
    /**
     * Set a callback for every Windows message, run before TranslateMessage().
     *
     * \param callback The SDL_WindowsMessageHook function to call.
     * \param userdata a pointer to pass to every iteration of `callback`
     */
    extern DECLSPEC void SDLCALL SDL_SetWindowsMessageHook(SDL_WindowsMessageHook callback, void *userdata);

    ...which literally just gives you first shot at win32-level events, before SDL does anything with them, and this might be enough.

  • SDL_AddEventWatch is similar, but you only see SDL-level events, and you only see them when pumping the event queue, which may or may not be enough.

I wrote one of the first UDP implementations for Duke3D back in the day, so I totally get this. But the fragility of Duke's system is going to bite you sooner or later, window dragging or not.

Thankfully I've spent a few years at this point refactoring the whole thing, it's in a much better state than the old days. Basically impossible to go out of sync now unless someone makes a mod with faulty behaviour like making RNG calls during display events.

If the network is suffering packet loss, or extreme latency, it just waits before advancing (however, if there's a full connection loss, it'll stay waiting forever, but menus and stuff still work. This is the case right now if someone drags their window for too long), unlike DOS Duke which often would just have a massive hernia and then continue while remaining out of sync, fully locking up once you attempt to quit or start a new game.

Prediction code is also in the process of being completely overhauled. The plan is to implement a full rollback system and in-game joining at some point, as well. Failing that, I do have a WIP client/server branch which is partially functional, but buggy as shit simply due to how Duke3D was designed.

Just, the only major problem I'm suffering with now is window events. Hoping perhaps with these functions listed, I can figure something out. Thanks.

ell1e commented

@slouken I read above discussions & the patch. My apologies if I got it wrong, but here are my takeaways:

Why the patch looks not too terribly useful: from what I can tell from the comments, the patch completely replaces the regular resizing with a "manual" one that breaks default desktop handling like window snapping. (Is that correct?) To me, that sounds like a fundamentally not useful approach. at all. I also think that the redraw issue really is the secondary problem here, so I don't see the point in getting stuck on that one if it's so hard, so the patch seems like a dead end.

What I would suggest instead: why can't we have a "let me do non-UI app processing" callback that is guaranteed to still be on the main thread, but is banned from calling any SDL2 event/draw functions? This way one can do a nested call to e.g. netcode or audio or physics updates to keep things running while just skipping drawing & input processing. I think this would fix the pressing issue of total functionality drop-outs like netcode desync, internet connection losses, complete cutscene audio desync, ... while hopefully being way more feasible for SDL2 to provide? The original issue title talks about the blocked main thread after all, and I agree that's the way bigger problem, especially for multiplayer.

Edit: additional note: it would also most likely be way, way easier for many code bases to make use of such a callback if it is still on the main thread, than try to make their entire gameplay happen on a separate thread. It's just a different magnitude of headaches. So while it might seem like not much to work with, it could really help this situation massively.

Edit2: #1059 (comment) this also sounds very alike to what I am suggesting. I'd just prefer a proper, documented solution. It can still be marked as experimental. What about SDL_SetWindowsResizeProcessingHook or something similar as a name? The frequency in which it is called really wouldn't matter much, as long as it is "multiple times a second or more." Most proper code will know how to deal with game loop time fluctuations, after all.

In conclusion, I don't see much value in testing the patch. But is such a callback maybe more feasible? If yes, could this issue be reopened to reconsider that? It won't fix the redraw, but I really think the discussion got too sidetracked on that.

You're welcome to create a callback approach, but please create a new issue and/or pull request for that, since it's fundamentally different from this one.

ell1e commented

@slouken would it make sense to reopen #4614 then? However, I find that reopening this one (instead) is also useful, since I don't see that it started with this drawing-focused fix. That kind of just happened later in the discussion, not the initial "opener" as far as I can see

Are you worried about other platforms? This issue only deals with Windows, but similar things can happen for other platforms. For example on macOS if you click-and-hold on the close, minimize, or maximize window buttons, or open any of the app's menu bar tabs, the OS won't return from its event poll until that's done.

I don't know what a cross-platform 'solution' to event-thread-blocking would be (if one even exists) aside from restructuring your code to not have timing-critical things run on the only thread that has arbitrary blocking due to user and OS interaction, but if one exists I think it'd make more sense to discuss it in a cross-platform context rather than in a Windows issue thread.

ell1e commented

@slime73 I was simply unaware of that, since Linux doesn't seem to have any comparable issues, and I only have test environments for Windows and Linux. However:

don't know what a cross-platform 'solution' to event-thread-blocking would be (if one even exists)

I think from the SDL2 API side this is trivial. Just name it SDL_SetOSBlockingWindowOperationsProcessingHook or something. I mean that's a terrible name, but you get the idea. Now whether macOS's window management API even allows implementing that I wouldn't know. I personally usually don't port my apps to macOS, for various reasons. (I actually also don't know if Winapi allows it, I just read some comments above that suggested it does - I do use quite some Winapi stuff directly, but the windowing-related things.)

aside from restructuring your code to not have timing-critical things run on the only thread that has arbitrary blocking due to user and OS interaction

In my opinion this is not as a necessarily "brilliant" design as some make it to be, so let's just agree to disagree here. I think many others would see it like me. And this can often be solved too, by sticking with libraries that respect this problem better, instead of just hand-waving with "uh, throw threads at it or something." (Granted, SDL2 usually does respect this well outside of these few corner cases.) I could discuss this for a long time, but maybe can we just work under the premise that it's useful if people aren't forced to work around this with threads?

(I just want to reiterate that any program that can't deal with the process being starved of CPU time is fundamentally broken no matter what we do or do not do with window resizing. If you replace "user is resizing the window" with "daily virus scanner started running and nothing is moving quickly now" or "system ran out of memory and started swapping heavily to disk" you still have a bug in your program if the audio goes out of sync or network connections drop, etc.)

ell1e commented

@icculus I don't understand. At face value your comment just seems irrelevant to me. Any networkied action game will fundamentally drop out of the session if the entire PC hangs... so, huh?

I am really surprised I even need to go into this, since SDL2 seems to encourage a less-threads-is-better design in general, so why is my request apparently so weird? How in particular is it strange to want to not make the game misbehave and drop out just when I resize the window?

Yes, disk I/O should be loading screen only, or in threads. (Or non-blocking I/O! Threads are not always the only answer.) And yes, you can thread game logic and netcode, too. Should you? Should you just to make resizing not break everything massively? How is this scenario so contentious all of a sudden? I'm legit stumped.

So to get back to the issue, would it be possible to add a "let me do non-UI things on the main thread while the OS blocks the window" to SDL2? I find it really hard to believe it's just me finding that useful, even if I just scroll to previous comments. I don't understand this discussion. I don't understand either why "you HAVE to use threads" is an acceptable answer.

ell1e commented

And before anyone suggests to just remove window resizing: if nothing else convinces you here, I think for some users this is an important accessibility feature. I don't really want to need to argue why removing that, like adding excessive threading, is not something the SDL2 API should push devs into. Why can't it push for simple, maintainable, and still ok-behaving programs instead? I think such a processing-only callback would be a great, and also super pragmatic solution. At least I thought that is what SDL2 tries to be about.

How is this scenario so contentious all of a sudden?

The contention I'm discussing here is that people are saying "if my app freezes, [a specific disaster] happens," and while this issue with window resizing can cause an app to freeze, it doesn't change the fact that the app can lose processor time for reasons beyond anyone's control at any moment anyhow, so that app needs to be fixed to deal with the general misfortunes of process scheduling. If the audio goes out of sync when the window is resized, it can also go out of sync because the system is generally overloaded, and that's an app bug. If everyone's network game (not just the one player) fails when someone resizes the window, it will also fail for everyone when there is a little network disruption at the ISP's facility, and not being robust against that is an app bug.

As for the SDL-specific problem: it doesn't have a straightforward solution, because the problem is a specific quirk of the Win32 event queue, and the workarounds noted so far are problematic for various reasons mentioned above.

I listed some options (which I readily say are not straightforward, super-great solutions) that can be done today on the app's side, though. The Hit Test callback is probably what you want here, since it presumably runs on the main thread and happens exactly when a user is resizing the window.

And before anyone suggests to just remove window resizing:

No one is suggesting that.

ell1e commented

The Hit Test callback is probably what you want here,

Yes! However, that'd be a hack that could be removed or no longer work, and it's hard to find. I am suggesting an official callback for this purpose.

it doesn't change the fact that the app can lose processor

It's just not remotely comparable to me. People understand if the entire PC hanging or the network dropping out causes them to be removed from the game. I also don't understand why you suggest this is a bug or solvable. Window resizing however is not an unavoidable resource congestion. So I just don't get that line of thinking.

Edit: most network hiccups, or even disk ones, will by the way be below 100ms too. (I know with HDD suspend there are exceptions, but this is getting off track - loading screens exist.) A window resize can easily be multiple seconds if the user recenters the mouse, gets lost in thought, ... so it's an entirely different scale of disaster for the netcode to deal with. And while the user might be shot in the game, or something else happening while not actively playing, this is also a quite different expected outcome to being suddenly booted from the entire session.

Edit 2: also for completeness's sake, "audio [...] can also go out of sync because the system is generally overloaded": it is easy to make a game loop that tolerates up to 1000ms or so total process hang with no desyncs, e.g. with a fixed timestamp loop that catches up. This eliminates problem for many users. Also if my PC freezes for multiple seconds and that desyncs audio in a cutscene, I think many people will understand even if they're annoyed. (And possibly get a faster machine.) If I resize the window too long and now cutscene audio desyncs, that's way weirder. So again I just don't find it comparable in impact.

Okay, I don't have anything else to add to the discussion of lag generally.

I would say start with the hit test API and let's revisit this if it doesn't fix the concern. It's a standard SDL API and won't be removed, and it hooks a callback into exactly the win32 quirk you want to deal with, afaik.

If it doesn't, let's discuss further.

ell1e commented

Ok I will test it out, but maybe one last list why I think a "proper" function is still called for:

  1. It is not obvious at what minimum rate hit test will be called. E.g., will it be called even if the user is resizing, but temporarily not moving the mouse? That would then not actually solve the problem.

  2. It is not obvious what I should return to not indicate any change from the regular window shape. Will always returning SDL_HITTEST_NORMAL for example somehow disable the regular outer draggable frame? While I can test this out on myself, is this problem really obscure enough that everyone else running into this absolutely has to wonder the same thing? A dedicated processing-while-OS-hangs-event-loop callback could clear such things up better in its documentation.

  3. The wiki page actually discourages using it for any processing: you should try to keep your callback efficient, devoid of allocations, etc. I imagine any proper processing callback would still ask me to keep it minimal to avoid resize lag, but this will currently just scare away anyone who is trying to solve this same problem.

  4. It is completely unclear to me, and I do assume that this won't solve the problem on macOS with e.g. the app menu. If it is even possible to have something similar there, then a "proper" main thread processing callback could also be fired up there and people wouldn't be stuck with a different solution for every platform. While I currently plan no macOS build for various reasons, I do try to keep my code compatible with it if possible just in case, so I'm interested in a cross-platform solution.

  5. It is not listed what APIs of SDL2 are safe to interact with while inside this callback. I imagine anything that actively affects graphics is out. But is passively obtaining window size, or polling keystate ok? A dedicated processing callback could have this documented, since I imagine for the hit test one the average dev using it would just be confused by this info being added on the hit test wiki page.

  6. I also still don't get why this problem is supposedly unimportant enough that it doesn't deserve its own callback with its own SDL wiki page that properly explains its use. It seems to me like almost any multiplayer game dev using SDL2 who discovers this quirk will likely end up asking themselves the same questions, if they even ever realize the hit test allows working around this. (Although I imagine many will just give up, and produce games with either resizing disabled, or where they just hope and pray no user will resize it for too long. I think that's not a good outcome.)

@ell1e on Windows, if you just want a basic callback roughly every X milliseconds when SDL_PumpEvents is blocking due to resizing/similar, you can use SetTimer.

const UINT timer_period = 100;

UINT_PTR timer_id = SetTimer(NULL, 0, timer_period, [](HWND hWnd, UINT uMsg, UINT_PTR nIDEvent, DWORD time) {
    /* hWnd == NULL, uMsg == WM_TIMER, nIDEvent == timer_id, time == GetTickCount() */
    SDL_Log("Callback %p %u %u %u", hWnd, uMsg, nIDEvent, time);
});

SDL_PumpEvents();

KillTimer(NULL, timer_id);

You are still at the mercy of the OS (i.e I noticed the first callback when resizing can take 200ms or so), but it's better than nothing.

ell1e commented

@0x1F9F1 (edit: shortened) maybe SDL2 could use SetTimer to implement this officially, since I imagine going into processing in the hit trace callback might impede resizing more than doing so in e.g. a 50ms timer. Let me try another name: SDL_SetProcessingCallbackForWindowBlock. (I'm sorry, naming is hard.)

@ell1e, the basic problem is that we haven't designed a good callback for what you're describing. I agree it's a real problem, and the real fix should be cross-platform and ideally not change how people write SDL applications.

You're welcome to write something that works well for you and attach a patch here for people who run into this problem in the future. We do plan to fix this at some point, we just don't have a good way that meets everyone's desires.

ell1e commented

ideally not change how people write SDL applications.

I feel like it was agreed upon in previous comments that was likely an impossible goal due to SDL2's design of letting the app own the main loop.

The callback I am suggesting (maybe SDL_SetProcessingCallbackForWindowOpBlock?) would be resigning to that reality, and enable people with more single threaded apps to change their code to keep things running by continuing non-UI updates like netcode to avoid disastrous effects of resizing blocks, outside of the IMHO minor no-redraw issue. Compared to the other suggestions like "use threads," I think this approach would allow most affected SDL2 apps (those that malfunction if stopped for too long) to be adapted with really minor changes.

I suggest for this callback:

  1. it should be guaranteed to happen on the main thread only,
  2. it shouldn't fire willy-nilly when nothing is really blocked (to not mess with main loop timing unnecessarily, and optionally also serve as an indicator for the app things are currently blocked)
  3. it fires at some reasonable frequency between 5ms and 20ms to keep faster-paced networking alive while not spamming in a near busy loop,
  4. it should have a wiki page that clearly says what is allowed in the callback (e.g. I assume touching any SDL2 UI or event processing functions would be strictly forbidden, and touching any not non-blocking I/O heavily discouraged), but it's less obvious for other things like maybe keys pressed state,
  5. it should cover at least some of the common blocking cases as a start, like window resizing on MS Windows.

Hey @icculus , you mentioned SDL_SetWindowsMessageHook; "...which literally just gives you first shot at win32-level events, before SDL does anything with them, and this might be enough."

But there's a slight issue with it: While blocking, the SDL WNDPROC will get called, but the message hook provided by the user won't because it's called in Win_PumpEvents, which is blocked.

I know there's the TranslateMessage guarantee, but perhaps it could be loosened in these blocked conditions? If we see a WM_ENTERSIZEMOVE come in we could set a bool to run the user provided hook during the SDL WNDPROC. I've done some testing with this, and it seems to work alright, and is better than using the HitTest since that's a bit more hacky. This would allow the usage of WM_TIMER if needed and all that, while still (I think) preserving the TranslateMessage guarentee.

Alternatively could we set a hint or something to allow the hook to run directly in the WNDPROC?

@playmer does the code in #1059 (comment) work for your situation? Simply adding the hook in the WndProc won't really fix the problem, since the messages are only sent when the window is actually moving/resizing, not when the user is just clicking/holding the window.

@0x1F9F1 it would kind of work, but it's preferable to have the full spectrum of options available. In the case Inochi-Creator, which is an application. I want clean resizing, or as close to I can get it. So being able to get events when the user is actively resizing is needed.

That said, you did point out something crucial, we'd also need to call our WindowHook when we see the start and end of blocking, so the user could take advantage of the start and end as well to set up a WM_TIMER. I'm not using that in the app I tested this in, but would need it in another I added a custom callback to SDL for.

@playmer if you want clean resizing, you need to draw on WM_PAINT inside of WndProc, which you can do by checking for SDL_WINDOWEVENT_EXPOSED in an event watcher. If you also want to draw periodically when the window is blocking, you can use SetTimer, either with WM_TIMER or the callback parameter. I've written an example of doing both here.

That's not to say I think doing stuff inside an event watcher is a good idea, but it's the currently the simplest way of handling the messages in the proper manner.
I have a basic mockup of a callback which uses the paint and timer messages (ignore the docs i wrote, they are oudated), do you think that would provide the functionality you need? https://github.com/0x1F9F1/SDL/tree/blocking-message-callback

Here's what I suggest : https://gist.github.com/RT222/804bda0bb1ed305e6351dc3a9a07869b

That's how I fix this issue in my engine, and it's the most elegant way I could find. It's not hard to implement, doesn't require multithreading and is easy to use.

It probably wouldn't be too hard to add it to the SDL. What do you think about it?

Here's what I suggest : https://gist.github.com/RT222/804bda0bb1ed305e6351dc3a9a07869b

That's how I fix this issue in my engine, and it's the most elegant way I could find. It's not hard to implement, doesn't require multithreading and is easy to use.

It probably wouldn't be too hard to add it to the SDL. What do you think about it?

This is a good approach!

ell1e commented

I like void SDL_SetModalLoopCallback(ModalLoopCallback callback, void *userdata). Note I omitted the second calback though, and I think there needs to be a userdata parameter:

I think it is best to start with a design that does not assume it's about MS Windows' window resizing and redrawing (since as mentioned above on macOS there is a similar but different situation with app menus which should ideally use the same callback), and then if needed do a separate mechanism for like, the resize callback or whatever is then specific to MS Windows' redraw. Something like SDL_SetModalMSWindowResizeCallback(...), maybe. This keeps the base mechanism simpler and more universal for those who just want their netcode and other vital logic updates to not die, and those who really want the full redraw magic could use the additional special functions for the specific use cases to handle viewport resizing, etc.

I'm glad to see I'm not the only one to think this is a good solution for this problem.

@ell1e I totally agree with you. The code I submitted is a simplified version of what I actually use, I wanted it to be clear and straight to the point, maybe a bit too much. I added your suggestions to the code.

ell1e commented

I like your changes. Just as a note: void *userdata = NULL in the parameter list is afaik not valid C99, but that's a minor nitpick. Now if all of this could be accessed by just using SDL_SetModalLoopCallback/SDL_SetModalLoopResizeCallback and providing the Resize/Draw callbacks and userdata and SDL2 does everything else in that gist, that'd be amazing in my book. I hope something like this can be added, it looks really good to me. (Disclaimer: haven't test-run the code yet.)

Yes, that would be ideal. I can't promise anything, but if I can find the time to work on it, I will make a PR to try to push this in the SDL.

But note that there's probably some corner cases that aren't handled in the EventWatch function. For example, I found a question on stackoverflow where someone encountered a case where WM_ENTERSIZEMOVE and WM_EXITSIZEMOVE aren't paired. I couldn't reproduce it, but it's still probably something we should take into account.

I like your changes. Just as a note: void *userdata = NULL in the parameter list is afaik not valid C99, but that's a minor nitpick.

Indeed, that's what too much C++ does to you. I fixed it.

peppy commented

In case it helps someone, we seem to have been able to work around this issue using SDL_SetEventFilter (ppy/osu-framework@c938e6c). It'd be great if a solution can be reached to fix this in a sane way.

I dug up this issue after more than 8 years of ignoring this Windows issue and decided to revisit it. For anybody still suffering from this Windows OS peculiarity I think it is fair to mention that Allegro5 does not suffer this problem on Windows, I just compiled Allegro5 and tested that I am unable to block the rendering while dragging, resizing, interacting with the SYSMENU or even holding down the close button at the top right of the window menu.

I will leave the link to the win32 callback here for reference in hopes someone wiser about either SDL, Allegro, etc, will find it useful enough to adopt into SDL if possible.

https://github.com/liballeg/allegro5/blob/master/src/win/wwindow.c#L962

tycho commented

I will leave the link to the win32 callback here for reference in hopes someone wiser about either SDL, Allegro, etc, will find it useful enough to adopt into SDL if possible.

https://github.com/liballeg/allegro5/blob/master/src/win/wwindow.c#L962

It seems the Allegro code just spawns a second thread which creates the window and handles the event queue, which is a known approach.

The main problem I've encountered when implementing that kind of approach in my own code (which uses SDL) is that some SDL functions can only be called from the thread that created the window (e.g. SDL_StopTextInput or SDL_SetWindowPosition from the non-window thread will cause the calling thread to hang forever). So a multi-thread approach works, but you have to be supremely careful with what APIs you use.

Moving to 3.0 to explore the approach Allegro uses. If someone wants to investigate this sooner, feel free!

ell1e commented

Just so it doesn't get lost, I really hope this approach gets some consideration: #1059 (comment) I think it fits SDL2's minimally-threaded API the best. And it would still be possible for an app to use a custom threaded worker on top if desired, while it may not work in the reverse: for example, the main codebase of mine I want to try this in doesn't support threaded main logic and a rework might not be feasible. But maybe that's just me, I imagine this would also affect others though

Edit: this comment was written under the assumption that the Allegro approach would expose a threaded callback to an SDL2 app for use, if this isn't the case please ignore my misguided comment

Any updates on this?

I think I finally found a decent workaround for this using fibers and an event filter. Here's a minimal example: https://codeberg.org/hstormo/sdlwin/src/branch/main/main.odin It's written in Odin because that's what I use in my spare time, but it's fairly close to what it would look like in C.

In short, the event filter allows you to access window messages as soon as SDL receives them, and by running the event loop on a separate fiber, you can return to the main loop while the modal loop is still running. It doesn't require you to set up callbacks for drawing or anything; your main loop looks the same as usual, except you switch to the event fiber to run the event loop.

Still there seems to be some cases where the contents freeze, such as clicking and holding one of the window buttons. From what I can tell, in those cases the main thread is completely blocked, it does not even receive WM_TIMER messages from a running timer. To fix that I'm pretty sure the only way is a proper multithreaded approach such as what Allegro does.

Here is the fiber trick implemented internally in SDL: main...hstormo:event_fiber With this patch your own code doesn't need to change at all. It works well in every example I tried it on.

It doesn't quite work right if you use SDL_WaitEvent or SDL_WaitEventTimeout, but you can use WaitMessage or MsgWaitForMultipleObjects instead to get around the problem. Maybe someone who knows that codepath better than me can get it right.

Perhaps @slouken can comment on whether this is worth opening a pull request for. I don't know if using fibers falls under the "risky behaviors" that have been mentioned before. One gotcha is that the event pump must only be called from the same thread that initiated the video device, since that thread runs the fibers -- but that is already a documented requirement.

DOSBox-X developer here: The in-tree SDL1 library was modified for Win32 builds to separate the SDL window into a parent and child window, and a separate thread to manage one of them, so that moving/resizing the window or using the menus does not halt emulation. This trick is how DOSBox-X is able to continue running normally even when resizing. Feel free to adapt if desired.

https://github.com/joncampbell123/dosbox-x/tree/master/vs/sdl

Not every SDL application needs this of course, so if it is added let it be an option on, say, SDL_Init()

It seems very odd that this kind of industry leading library is unable to let my code run when the window is dragged. For like a decade, from what I'm reading here? Just don't render anything, let all SDL code fail horribly, anything, but for FSM sake, don't block my code!

SDL doesn't know ahead of time that you're entering the message pump to be dragged for an indeterminate amount of time. This is a limitation of the design of the SDL_Event loop interacting with the Windows event loop. There are many workarounds, but they'd need to be implemented and all require changes on the part of the app.

Perhaps the design could be revisited in SDL3 to not interact poorly, but I'm not sure what it would look like.

I am sure the techicalities of why this is a problem are sound, and I am sure it's the usual microsoft thing that's causing it. However, the consequences are absolutely terrible. All I'm saying, all my criticism is regarding priorities. I'm sure music visualizations and things like that are loving it. If they are smart, they'll probably let the music go bwbwbwbwbw until you let the window go. Heck, I'm going mad just having my avg time measures f'ed up for seconds when I have to move the window out of the way of the console after starting all the time.

Btw. all of that needs to be combined with all I've read how you must not move the rendering or the event loop to different threads. All of this appears pretty extreme to me. Which is of course measured by the standing SDL seems to have. It's not like I would complain about some dude's engine that way. Anyway, cheerio everyone. Just felt that this whole thing needed quite the kick in the behind.

There is a reason dragging/resizing the window or using the menus blocks execution of your SDL application.

The way Windows handles those interactions, and always has handled it going back to Windows 1.x even, is that DefWindowProc() goes into it's own event handling loop to handle that action. This of course blocks the SDL event handling loop.

The way DOSBox-X handles it is by modding the SDL library to maintain both a parent top level window and a child window inside, and then a separate thread handles message handling. If DefWindowProc() blocks for window size/move and menu interaction, then that thread is blocked while the main SDL application continues to run unimpeded.

Perhaps official SDL development can handle it differently or possibly cleaner, but that's how you can avoid the blocking issue entirely.

Appreciate the help, but really I'm not using a cross platform thingy that "is mainly used to handle cross platform window management" to work around window management tailored to specific platforms. The only solution that works is that SDL is just able to move a running program across the screen, even on an outlandish platform like windows.

We're still discussing what the appropriate way to work around this Windows limitation should be for SDL3, which is why this issue is still open.

While we discuss that, I'm going to lock this thread, as I think we have enough feedback telling us that people feel strongly about finding a resolution.

I've added a solution that dovetails nicely with the new main callbacks in SDL 3.0 and if you're not using that you can set an event watcher to handle expose events and draw then.

Thanks for all the feedback!

Being angry because you are completely ignorant on a topic doesn't help you or anyone else. This issue is indeed related to Win32 modal loops, and also applies if you use Win32 directly. A little search on the internet, or just reading this conversation, would have told you about that. Moreover, this problem is solved now, so I don't see the point of your intervention.

Anyway, props to the SDL team for your amazing work on this library, you don't deserve such rude comments.

@RT2Code It should be the SDL team that apologizes for the rude comments themselves. If you make a multiplatform library like this, it is your responsibility to make sure that your event loop interacts correctly with ALL of the platforms. SDL had this issue for several years and they kept brushing it off, as many of the other people in this conversation have pointed out. @icculus 's comments above, berating people because they expect this library to not block on window resizes (or even holding down one of the window buttons on macOS), and even trying to compare this to someone unplugging an ethernet cable, was extremely disappointing and completely uncalled for.

And no, the problem isn't solved. It still exists in SDL2. We still have to use a workaround for SDL2.
Your "example" StackOverflow link (as if StackOverflow is the best place to get programming advice, lmao) isn't using the Windows API correctly. I think you will find that the Win32 API documents what is expected and what isn't, unlike SDL.
This is well explained in one of the StackOverflow answers:

When DefWindowProc handles WM_SYSCOMMAND with either SC_MOVE or SC_SIZE in the wParam, it enters a loop until the user stops it by releasing the mouse button, or pressing either enter or escape. It does this because it allows the program to render both the client area (where your widgets or game or whatever is drawn) and the borders and caption area by handling WM_PAINT and WM_NCPAINT messages (you should still receive these events in your Window Procedure).

It works fine for normal Windows apps, which do most of their processing inside of their Window Procedure as a result of receiving messages. It only effects programs which do processing outside of the Window Procedure, such as games (which are usually fullscreen and not affected anyway).

Many people have solved this problem easily. It only took the SDL team many years to fix it.

icculus 's comments above, berating people because they expect this library to not block

I didn't berate people, I offered several possible technical workarounds, and I locked this thread because it keeps generating unhelpful commentary like this, which is also why I'm locking it again now.

This is fixed for the SDL 2.30 release, in 509c70c