farmerbriantee/AgOpenGPS

5.2.0 and above versions lag

mirh opened this issue · 7 comments

mirh commented

I'm using AOG on a Z8350 windows 10 tablet, connected through usb/serial to a simpleRTK2B-F9P set to 10hz navigation rate.
5.1.5 is just butter smooth while moving, but everything after that noticeably stutters.

Yes, I have seen that its release notes specifically mentioned some kind of cap to "fix delays". But even assuming that was the real solution, there's something absolutely fishy going on with the way updates/refreshes happen.
I have monitored every single component of the system, and nothing is anywhere close to being a bottleneck (be it gpu, disk or any single core of the cpu).

Yet where the older version can paint a new frame to screen in more or less 90 to 110ms (exactly where you'd expect 10hz to be) everything newer (even with Minimum Frame Pause set to 90ms) takes 200-400ms.
I definitively cannot confirm any overload because of "making a zillion frame updates". And I get that on a general purpose pc other processes may seldom steal resources and degrade user experience for the foreground app, but this isn't about flukes. It's always like that despite no apparent cause.

mirh commented

The latest version seems to be a bit better (every now and then I can actually hit 10fps) but it's still greatly behind the near perfection of 5.1.5.

I also noticed that when I updated my receiver and accidentally reset its refresh rate back to 1Hz, 5.1.5 was behaving kinda like the broken version.
So... together with my previous observation about CPU time, it sounds like the lag would be actually just a symptom of AgIO not pushing enough updates up to AOG? (the situation in the slowest cases seemed to slightly improve if I tried to restart the former, but maybe it was just placebo).

As if there was a race condition or something in the time/parse loop (more or less like here, except I can 100% reproduce this with my ardusimple). Is it really that hard to implement a LIFO/mailbox queue?

mirh commented

Aaaand I got how to reproduce this everywhere (well, with a simpleRTK2B and their Rover 10Hz configuration at least).
Set up AOG like normal, then open task manager to force AgIO.exe affinity to only one thread.

Just three would be already enough to show a noticeable difference between the 5.1.5 version and the newer ones on my i5-10300H workstation. But with just one thread it's exponentially worse.
Like, 20 to 50 times worse... with framerate better measured in "seconds per frame".

Then, yes I also found out a sleek workaround is disabling GLL, GSA and GSV because at the end of the day only GGA is strictly necessary. But really? I'm not sure I can underline how much it's crazy that a program can choke while only harnessing 3% of the power available.

So.. you have to do something “weird” in windows to force a program to one threat to let the program run very poor?

We measured the time between a message coming in at AgIO and pushed to AOG, on a regular laptop, no big specs.. it did it in less then 1ms.. so I think the problem isn’t AgIO. The 10FPS you mentioned is 10HZ gps update?

mirh commented

So.. you have to do something “weird” in windows to force a program to one threat to let the program run very poor?

To simulate the weird behaviour that I'm seeing on my Cherry Trail tablet on a much beefier system, I have to do that. Yes.
I know it sounds stupid, but I'd like to underline again that this is not about having too little horsepower (even though the factor may be slightly related).

When I get lag in the aforementioned scenarios, CPU usage (of even that single one core) is super low.

The 10FPS you mentioned is 10HZ gps update?

Yes. I measure framerate with MSI afterburner, so that even when I'm stationary I can tell without a doubt when a new frame is painted.

mirh commented

This is not completed at all..
I just tested 5.7 on a 9600k set to two threads, and I can barely do 4fps.
With just one, sometimes it has such low dips that AOG thinks signal is not even there.

And again to reiterate: nothing in my computer is under any significant load.

mirh commented

After a hour of bisecting I finally found the culprit
c4129a6#diff-f6be31d3628f90798babe6156ea2a4f24aec1b2c6ac18bd2f19dbd49a6cb60d7

It's just enough to remove these two lines, and for whatever the reason the problem is solved.

rawBuffer = "";
return;