Dear ImGui interface
audetto opened this issue · 17 comments
The Dear ImGui interface has been merged to master, see https://github.com/audetto/AppleWin/blob/master/source/frontends/sdl/README.md
A couple of screenshots below:
In a separate window:
The currently available options are really just to get a feeling for the library.
Some thinking is required about how to organise the settings.
Imgui progress looks great. Providing some mechanism for proportional scaling would be good (e.g. with black bars on either side when fullscreen), but I'm very excited about this approach. When emulator is toggled to run in a separate window, opening the emulator window (within a window) up at a larger default size would help for those not familiar with Imgui.
For others' reference (as well as my own), once you recursively git the code, you'll also need the imgui library. I'm just copying the imgui directory into the AppleWin directory. From there, build (from inside new build directory) with:
cmake -DIMGUI_PATH=../imgui ..
make
Then run with 1 of these commands:
./sa2 --imgui
./sa2 --imgui --qt-ini (launch with configuration settings from Qt version)
Results of running the above comes within 0.5 Hz of expected values (runs well) on a moderately overclocked Pi4.
For testing purposes I ran through some of the use cases:
./sa2 -m --imgui --qt-ini (launch with configuration settings from Qt version)
Results of this above line cause noticeable problems. Output reports ~20Hz of deviation faster than expected and I'm not seeing separate processes with 'top'
Testing with the various drivers (w/o imgui and w/o -m argument):
0: Expected clock: 1020484.45 Hz, 27.96 s Actual clock: 1019216.34 Hz, 28.00 s
1: Expected clock: 1020484.45 Hz, 60.94 s Actual clock: 1020258.14 Hz, 60.96 s
2: Expected clock: 1020484.45 Hz, 39.84 s Actual clock: 1019883.32 Hz, 39.86 s
3: Expected clock: 1020484.45 Hz, 37.55 s Actual clock: 1017197.87 Hz, 37.67 s
Testing w/o imgui, but w/ -m argument and the most ideal renderer:
./sa2 -m --sdl-driver 1 --qt-ini
Expected clock: 1020484.45 Hz, 954.80 s Actual clock: 1020485.88 Hz, 122.37 s
On the Pi4 I'm just not seeing a performance difference. There was noticeable "flutter" around the timing
./sa2 -l --sdl-driver 1 --qt-ini
Expected clock: 1020484.45 Hz, 132.59 s Actual clock: 1020433.32 Hz, 132.60 s
Imgui progress looks great. Providing some mechanism for proportional scaling would be good (e.g. with black bars on either side when fullscreen), but I'm very excited about this approach. When emulator is toggled to run in a separate window, opening the emulator window (within a window) up at a larger default size would help for those not familiar with Imgui.
Yes, still learning a bit how this works.
For others' reference (as well as my own), once you recursively git the code, you'll also need the imgui library. I'm just copying the imgui directory into the AppleWin directory. From there, build (from inside new build directory) with:
make -DIMGUI_PATH=../imgui ..
make
I could add a submodule and avoid the configuration step. Planning to do it.
Then run with 1 of these commands:
./sa2 --imgui
./sa2 --imgui --qt-ini (launch with configuration settings from Qt version)Results of running the above comes within 0.5 Hz of expected values (runs well) on a moderately overclocked Pi4.
One needs to be careful about this. The speed adapts to the wall clock so it will constantly adjust to run at a total average correct. I don't think you can read much out of this. There is a parameter --fixed-speed
that runs a fixed 16ms every frame, but due to rounding it drifts. But it might be better to analyse.
For testing purposes I ran through some of the use cases:
./sa2 -m --imgui --sdl-ini (launch with configuration settings from Qt version)
Results of this above line cause noticeable problems. Output reports ~20Hz of deviation faster than expected and I'm not seeing separate processes with 'top'
What is the difference between the first run? --sdl-ini
does not exists
Testing with the various drivers (w/o imgui and w/o -m argument):
0: Expected clock: 1020484.45 Hz, 27.96 s Actual clock: 1019216.34 Hz, 28.00 s
1: Expected clock: 1020484.45 Hz, 60.94 s Actual clock: 1020258.14 Hz, 60.96 s
2: Expected clock: 1020484.45 Hz, 39.84 s Actual clock: 1019883.32 Hz, 39.86 s
3: Expected clock: 1020484.45 Hz, 37.55 s Actual clock: 1017197.87 Hz, 37.67 s
With --imgui
the render is ignored and it basically runs in GLES2 mode all the times.
Testing w/o imgui, but w/ -m argument and the most ideal renderer:
./sa2 -m --sdl-driver 1 --qt-ini
Yes, GLES2 is the best renderer for native SDL.
Expected clock: 1020484.45 Hz, 954.80 s Actual clock: 1020485.88 Hz, 122.37 s
On the Pi4 I'm just not seeing a performance difference. There was noticeable "flutter" around the timing./sa2 -l --sdl-driver 1 --qt-ini
Expected clock: 1020484.45 Hz, 132.59 s Actual clock: 1020433.32 Hz, 132.60 s
--imgui
and --sdl-driver 1
are running basically the same GLES2 code except a couple of extensions I used to simplify texture update: https://github.com/audetto/AppleWin/blob/master/source/frontends/sdl/imgui/image.cpp#L12
The best way to truly compare performance I think it is to check the FPS achieved at full speed. The number is only displayed for ImGui.
--gl-swap 0
valid for SDL and ImGui disables vsync and lets the emulator run at full speed.
If you combine this with --fixed-speed
you should see the true speed achievable and as long as it is > 60, it should work well.
I have found that the speed (per pixel) on a Pi4 degrades after a certain size: https://www.raspberrypi.org/forums/viewtopic.php?f=68&t=303201&p=1819135#p1819135 and for full screen the framerate drops to 40 FPS, but the adaptive speed should compensate a bit.
If you want to achieve maximum FPS on a Pi4, read this: https://www.raspberrypi.org/forums/viewtopic.php?f=68&t=304534&p=1823765#p1823139
I get fullscreen at 141 FPS!
I have proposed a few minor patches that move the Imgui codebase in the direction of having menus to access functionality:
https://github.com/webspacecreations/AppleWin/tree/patch-1
https://github.com/webspacecreations/AppleWin/tree/patch-2
Since the Settings window doesn't have a close button, I've used the menu item to serve temporary dual role of opening and closing the window. The Imgui demo window is there as well since you still had the code (and probably nice to have while in-progress). There's also an Edit menu, but it doesn't actually do anything (except demonstrate some Imgui conventions).
I'm guessing that you're in a better position to set the appropriate render offset for the emulator render area (so that menu doesn't overlap the emulator output), but I can take a stab at it if you'd like.
If you want to achieve maximum FPS on a Pi4, read this: https://www.raspberrypi.org/forums/viewtopic.php?f=68&t=304534&p=1823765#p1823139 I get fullscreen at 141 FPS!
That sounds really promising. I feel like I tried that setting in a series of experiments and the Pi refused to show video output upon restart, but that might have been on my Pi400 (and possibly KMS has been updated since then).
Full disclosure: I corrected my previous comment to prevent confusion.
Hiding the settings is good, and I want to keep as much as possible the same key bindings as AW. So I think F8 will do that too.
And I will want to hide any menu when running fullscreen (or have an option for it).
But now the key processing is common with the other SDL2 Frame. Maybe we should just settle for SDL+ImGui and forget the other renderer.
And yes, kms seems to be in high dev at the moment, I have never tried it before.
Looks great. What about a runtime switch to hide the menu, e.g. --no-menu? What other renderer are you using besides SDL2?
Maybe this is opening up the need for yet another thread, but after running apt-get update, apt-get upgrade, and switching the config.txt from fkms to kms, I'm still not getting graphical output on Pi4. Did you install an additional package (or use the raspi-config tool) to add kms support?
my config is
# Enable audio (loads snd_bcm2835)
dtparam=audio=on
[pi4]
# Enable DRM VC4 V3D driver on top of the dispmanx display stack
dtoverlay=vc4-kms-v3d
max_framebuffers=2
[pi3]
# Enable DRM VC4 V3D driver on top of the dispmanx display stack
dtoverlay=vc4-fkms-v3d
max_framebuffers=2
[all]
dtparam=act_led_trigger=actpwr
My kernel version is: Linux raspberrypi 5.10.11-v7l+ #1399
When you say no "graphical output", what do you mean?
My kernel is identical to yours. Thanks for sharing the config directives. I copied this over and the Pi4 now boots to XFCE windowed environment (previously I only got a blinking cursor). Things seem to be working fine, except AppleWin (only tested in windowed mode) now slows to a crawl (although with frame skipping it just means lots of skipped frames). As a precaution, I rebuilt AppleWin and it's not making a difference. I tried executing with ./sa2 --imgui --qt-ini and ./sa2 -m --sdl-driver 1 --qt-ini but it doesn't matter. Are you unlocking the VSYNC in your own tests (60Hz shows as refresh rate for me, which generally coincides with locking to vsync)?
I tried to explain it at the bottom of #22 (comment)
./sa2 --imgui --gl-swap 0 --fixed-speed
Thanks again for the external link to the KMS discussion and especially your config. Looks like some element of overclocking (or possibly a less obvious directive) in my config was interferring with KMS. There is definitely improved SDL speed (tested a few other SDL-based emulators and it makes a substantial difference). Getting KMS "production ready" should really be a high priority in the Pi community as it impacts so many projects and use cases.
As for AppleWin, it looks like there's something about my --qt-ini config vs a "fresh" run. The most obvious candidate is the video renderer, since my QT config is "Color (Composite Idealized)" whereas default 50% Color TV runs great. The composite idealized ran fine under fake KMS, but doesn't work well with KMS. 50% Color TV runs great. Maybe composite pushes CPU just beyond the line where the previous overclocked settings were picking up the slack. That's just a guess... need to go back and refamiliarize myself with the AppleWin function keys to more systematically test the video display options.
You would have to post a picture if possible.
I have not seen any difference in video quality.
F9 cycles the video mode: https://htmlpreview.github.io/?https://raw.githubusercontent.com/audetto/AppleWin/master/help/keyboard.html
Not all shortcuts are implemented.
It's been a busy few months, but now that I'm circling back to this, the ImGui version interface has come a LONG way and is very usable on a Pi4. Since the ImGui interface now provides much of the same functionality of Qt version, I was able to re-test the various color modes directly (F9 doesn't seem to cycle, but the menu options work perfectly) and am pleased to report changing color modes (including composite idealized) doesn't impact responsiveness.
Key for me being able to reliably get KMS loading is to use your simplified config exactly. Running on a factory clocked Pi4 with KMS, CPU utilization settles to around 60% of a CPU core at the default 2x window size.
I don't see a way to adjust audio latency in the ImGui version interface, but I suspect the audio latency is defaulting to 200ms, which is slightly noticeable on the tests I've run.
Exciting progress! Is there anything you need help testing at this point, either on Pi4 or Pi400?
It's been a busy few months, but now that I'm circling back to this, the ImGui version interface has come a LONG way and is very usable on a Pi4. Since the ImGui interface now provides much of the same functionality of Qt version, I was able to re-test the various color modes directly (F9 doesn't seem to cycle, but the menu options work perfectly) and am pleased to report changing color modes (including composite idealized) doesn't impact responsiveness.
Re F9: this is odd. Both F9 and Shift-F9 work on my Pi400.
I don't see a way to adjust audio latency in the ImGui version interface, but I suspect the audio latency is defaulting to 200ms, which is slightly noticeable on the tests I've run.
There is no way. AppleWin self adapts to the speed of the audio playback, it asks Windows how much is still in the buffer, which SDL does not really implement. So I introduced a Windows-emulation buffer which probably causes the delay.
You can see it in the settings->audio tab.
I am not too sure how to improve it.
Re F9: this is odd. Both F9 and Shift-F9 work on my Pi400.
My bad, you're right. I recently switched to a keyboard that has a dedicated Fn key.
I am not too sure how to improve it.
GSplus monitors SDL2 audio buffer (https://github.com/digarok/gsplus/blob/master/src/sdl2snd_driver.c) if that's any help. Don't believe it's doing too much other than copying the audio queue to the SDL2 sound buffer. Not sure how your Windows-emulation buffer works, but is it something that could be disabled with a runtime switch (and would that make sense)?
This code is more sophisticated than I fully understand
https://github.com/AppleWin/AppleWin/blob/master/source/Speaker.cpp#L693-L814
https://github.com/AppleWin/AppleWin/blob/master/source/Mockingboard.cpp#L903-L1022
it tries to adapt the audio generation to the actual speed of the sound card.
It does more than a simple circular buffer (see static
variables dwByteOffset
).
And remember that in linux, we run 1 frame at a time.
Might be worth splitting this out as a separate audio thread, since audio seems to be a sticking point on many A2 on Pi projects.
The Mockingboard code is clearly more complex as it's processing multi-channel sound, so probably easier to focus first on single channel Speaker. Is your own code modifying the AppleWin audio code (if so, where) or is the problem that the AppleWin code is introducing the audio latency (perhaps because it isn't reading the audio buffer size correctly)? It looks to me like the Speaker code is trying to balance buffer underruns and also not preload so much audio as to introduce unnecessary latency elsewhere. I can take a stab at this, but want to make sure I have a full understanding of the scope.
No activity and no problems reported for a while.