krakenrf/krakensdr_doa

v1.5 spectrum and doa graph freezing/not drawing after a while

Closed this issue · 21 comments

So with 1.5 the spectrum view and doa view as well become frozen after a while, this have happended to me on 3 separate runs around the block with the kraken. This is only happening with 1.5, not with 1.43.

Edit:
So luckily I was recording at the time it happened, so I can show how it looks like.

Here is a video of the problem.

This is how the compass looks like when this is happening:
This is how the compass looks like when this is happening

This is how the spectrum/waterfall looks like when this is happening:
image

And here the first few lines of the log (configured on the web interface, not in the mobile app) of when this happened:
log

Here is the same log visualized with the map:
logVisualized

And here is the log it self:
log.csv

I took a look at the log, and what is obviously not good is that the GPS coordinates are missing. Now I don't see what would cause this, or why would this have any effect on the spectrum or DOA graph, but this is the only deviation in the log I see.

I am also running the Wigle WiFI and BT gathering app (it relies on GPS as well) while using Kraken, but 2 apps using GPS was never an issue before, and still not causing issues with 1.4. So I dont think it has anything to do with it.

Is this on the Pi 4? About how long until it freezes?

Can you try do a Clear Cache and Restart via the system menu box on the configuration screen?

godsic commented

@galaris I was stress testing for some time, but cannot reproduce the issue. Would you be so kind to follow the guide on making logging more verbose here and then forward us those logs once the issue is triggered?

I will test it and give more details on my next run with the kraken.

I've been trying to replicate this bug too, but unsuccessful. Can I have more info about your Pi 4? RAM size, CPU version etc?

I've been trying to replicate this bug too, but unsuccessful. Can I have more info about your Pi 4? RAM size, CPU version etc?

@krakenrf I've also faced with this freezing and from my experiences it usually happens when you switch between tabs for long time and back to spectrum again and screen is not updated

godsic commented

@redradist clarified that for him the issue happens when the Kraken tab is in the background for a while. @galaris is it the same for you? If so, this might be due to the browser's throttling or fully suspending background tabs. I cannot reproduce it myself and I suspect it is very much browser/version-specific. Please let me know which browser/version you are using as this would simplify debugging a lot.

I have come across the same problem where Chrome will suspend the tab (after leaving it running overnight in an inactive tab), leading to the tab either stopping updates or rarely even hanging. But that is not something new, or something that should be happening with the tab left open.

There is possibly also a memory leak with the dash plots which could cause issues after a long time with it being open.

Well I haven't had the time to test more, but In my case the freezing of spectrum/doa page were consitent on my laptops firefox browser, as well as within viewing the webcfg inside the kraken mobile app.

So that for me would indicate that the problem is on the server side, not the browser. Again, this never happens with v1.43, just 1.5. On the status page, everything was fine, the counter kept increasing steady. However on the web cfg page, both the spectrum view and the DOA view were frozen. Since I have the log, I was able to confirm that there is data in there, so the reception is probably fine, the problem is somewhere in the process chain. It's like the data source for these 2 controls (spectrum/graph) is not getting what it needs to work. And I assume this data is the same that the mobile app would need as well. And since the log is populated with seemingly OK data, I assume it's not the exact same thing that the mobile app would need to work.

Obviously, whenever this happened the mobile app direction finding was also not working.

Edit: updated my initial comment with video and images.

Thanks for the video. Have you tried doing a clear cache and restart from the system control menu?

A freeze like that can sometimes happen if the numba caches have been corrupted somehow.

Not at the time, but as I mentioned, this only happened with 1.5, so I reverted back to 1.43. I will try 1.5 again and keep in mind if this happens to clear the caches, but this still indicates that there was some change in 1.5 that would cause this.

I suspect that there might be some minor differences between Pi 4 CPUs, and anything compiled and cached by Numba on one type of CPU becomes incompatible with another type and causes a crashe at some point. But on all my Pi4's bought over the years (1GB, 4G, 8GB) I haven't been able to replicate this issue.

If you can confirm if clearing cache helps, then that would be great.

HB9DTX commented

Hi, I am totally new to krakenSDR. I just touched it for the first time 2 hours ago, but I noticed the same bug. V1.5 is running on a Pi4 Model B (not my setup, I don't have much more details.). But if this can help, when the bug appeared on the spectrum tab, I let the PC screen on the frozen "Spectrum" tab, while reading some doc during several minutes. All of a sudden it unfroze by magic and is now running stable since since maybe 10 minutes. So no clear cache action from my side. That was just my 2 cent comment . Would be glad if it helps solve the issue.

Hm that temporary freeze might have been numba compiling something for a while. On an initial start with a clear cache it can take 1-2 minutes to compile numba functions on startup. In theory, once the initial compilation is done it is cached and won't need to ever compile again.

As to why it happens only with the Spectrum tab up i'm not sure yet.

Also another theory: If calibration was lost for some reason, the spectrum screen will stop during recalibration. On rare occasions the calibration could take a minute or so.

Ok after some more investigation, I think this is related to the numba cache corruption problem, so clearing the cache should be fine. I found that the spectrum can crash when moving the VFO by clicking on the waterfall or spectrum. Clearing the cache fixes this.

I don't know why the cache sometimes corrupts, but we'll just have to make sure the cache is cleared at default for every image release from now on.

kk6i commented

I have just started using Kraken for ham radio T-hunts. I am using v1.5 and a raspberry Pi 4 with 2 GB of RAM, android Tab A7 as a mobile hotspot and mapping running the KrakenSDR app. I also experienced the freezing after about 15 minutes of T-hunting. Each time the program froze, I had to restart a new log file, and in a few cases reboot the Pi and Kraken. After about 2 hours of hunting, we ended up with 7 log files, with a new file created after each "lock-up" on the Kraken SDR app. Each time I checked back on google to the Pi mapped IP and several of the connections were showing red, rather than green. To me it seemed like the app at some point became overloaded with data and would freeze up.

kk6i commented

I should also add that being new to the Kraken, I have not tried earlier versions of the software, so I do not have a historical reference of performance on earlier versions of the software.

HB9DTX commented

I did a test drive and noticed that the problem is not as severe as I saw it last time. The PC screen sometimes freezes, but recovers without any problem. Apparently when hooving the mouse here and there on the screen. Maybe it was just coincidence, but it happened at least 2 times. I didn't need any reboot from the RPi or the PC over 1 hour.

Contrarily, the app on my old Samsung S7 was not that stable, but this is maybe for another thread. I had to restart it. Looks the same issue as kk6i mention. I also noticed that after 1 hour of use, the phone was really hot, up to a point that I wasn't allowed to use the camera due to a too high temperature. It's the first time I experience this with an app. Quite some data processing it seems...

I have the same problem, after clicking on spectr

godsic commented

@krakenrf I managed to hit the problem on my Intel NUC-based setup. For me calibration was failing and delay_sync.py produced the attached core dump. So, at least in my case, the problem appears to be in heimdall_daq_fw, but not krakensdr_doa related. I am not an expert in numba logs, but the problem appears to be in the following code:

@njit(fastmath=True, cache=True, parallel=True)
def correct_iq(iq_samples_in, iq_samples_out, iq_corrections, M):
    for m in range(M):
        iq_samples_out[m,:] = (iq_samples_in[m,:]-np.mean(iq_samples_in[m,:]))*iq_corrections[m]

FYI @petotamas
delay_sync.log

@godsic If you remove the njit decorator to just use standard Python, does the problem go away?

Should be fixed in recent versions.