gasagna/A76XX

Modem Performance degrades over time, trying to figure out why!

Opened this issue · 4 comments

I've had two of these boards running since the beginning of november. They are supposed to upload data every other second (30 times per minute). And in the beginning, this was working fine: No problems at all with upload times being 900-1100 ms. We can live with that.

But now we are experiencing lots of dropouts and high upload times. Switching the SIM-card from an "old" board into a new board (running the same code) gives stable uploads and low upload times as expected.

What could cause the boards to deteriorate like this? Is there something we can do to avoid it?

I was wondering if this could be a result for degrading flash memory . Looking into your examples like SaveCertificates.ino, it seems that the certificate needs to be written as a file somewhere (on the ESP32? or the modem?). How often does this happen?

In the ToDo-list it seems that the overwriting should not occur anymore, but I have a hard time figuring out if this is the case. The certOverwrite function overwrites no matter what (but perhaps that is intended):

    // delete certificate if exists, then download
    int8_t certOverwrite(const char* cert, const char* certname) {
        int8_t retcode;

        if (certExists(certname)) {
            retcode = certDelete(certname);
            A76XX_RETCODE_ASSERT_RETURN(retcode);
        }

        retcode = certDownload(cert, certname);
        A76XX_RETCODE_ASSERT_RETURN(retcode);

        return retcode;
    }

Our code only connects to one server, and (hopefully) using the same certificate over and over again.

This library is still very impressive, and it's massive. I might be overlooking something but what?

The certificate gets written in the flash memory of the simcom module whenever you call* A76XXSecureClient::writeCaCert. This memory is non-volatile, so in principle you could run a sketch once that store the certificate and then you upload your actual sketch that only uses A76XXSecureClient::setCaCert. In this case, you would never touch the flash memory. But honestly, I do not know what causes your problem. Have you tried turning on the debug output and see where is gets stuck?

*Note that the http and mqtt clients inherit from A76XXSecureClient.

Thanks for getting back to me! I'm grasping at straws here, trying to figure this out.

Our code is just a tweaked version of the HTTPpost example. We're doing the http_client.post() in a Task that loops every 2 seconds. Another task is collecting samples using ESP-NOW, and the modem task reads them every 2 seconds, creates a json-string, and performs a HTTP Post after that.

Apparently, on a brand new modem this works brilliantly with the exact same code.

On an older it gets stuck and times out. Alot.

Like here:

2023-12-04 09:18:17.797		--> Executing post request... 
2023-12-04 09:18:17.802		AT+HTTPPARA="URL","https://ourdomain.org:443/api/"
2023-12-04 09:18:17.898		__
2023-12-04 09:18:17.904		OK
2023-12-04 09:18:17.911		AT+HTTPPARA="USERDATA","User-Agent:A7608E-test/0.0.1"
2023-12-04 09:18:17.915		
2023-12-04 09:18:17.921		OK
2023-12-04 09:18:17.926		AT+HTTPDATA=1542,30
2023-12-04 09:18:17.928		
2023-12-04 09:18:17.930		ERROR
2023-12-04 09:18:17.931		G 11
2023-12-04 09:18:17.932		ERROR, code: -2 Generic

G 11 is printed over the last A76XX_GENERIC_ERROR in http.h -> inputData. (There's so many generic error references, that we needed to number them).

We can also get:

2023-12-04 09:18:19.787		--> Executing post request... 
2023-12-04 09:18:19.790		AT+HTTPPARA="URL","https://ourdomain.org:443/api/"
2023-12-04 09:18:19.791		
2023-12-04 09:18:19.792		DOWNLOAD
2023-12-04 09:18:47.747		ERROR
2023-12-04 09:18:47.748		ERROR, code: -2 Generic

... and:

2023-12-04 09:18:49.769		--> Executing post request... 
2023-12-04 09:18:49.770		AT+HTTPPARA="URL","https://ourdomain.org:443/api/"
2023-12-04 09:18:49.771		
2023-12-04 09:18:49.774		OK
2023-12-04 09:18:49.775		AT+HTTPPARA="USERDATA","User-Agent:A7608E-test/0.0.1"
2023-12-04 09:18:49.776		
2023-12-04 09:18:49.777		ERROR
2023-12-04 09:18:49.778		AT+HTTPDATA=1623,30
2023-12-04 09:18:49.784		
2023-12-04 09:18:49.922		DOWNLOAD{"mm_apikey":"our_key","data":{"station_id":123456789,"station_unixtime":1701677928,"values":[1200-1600 BYTES OF JSON]]}}
2023-12-04 09:18:49.924		
2023-12-04 09:18:49.924		OK
2023-12-04 09:18:49.926		AT+HTTPACTION=1
2023-12-04 09:18:49.928		
2023-12-04 09:18:49.929		ERROR
2023-12-04 09:18:49.930		G 3
2023-12-04 09:18:49.931		ERROR, code: -2 Generic

"G 3" is printed over the last A76XX_GENERIC_ERROR in http.h -> inputData.

Other things to notice. The number after "Total Send Time" is generally higher on a malfunctioning device, approx 400-700 ms higher.

Here's a graph showing problem. It's the number values it has uploaded from that modem for each hour since December 1st. The value should ideally be 3600 for each hour (one every second), disregard the first and last values, the hours are incomplete.
image

I'm not sure why this is happening. I tried erasing the flash on the ESP32, but that didn't help (that helped on a regular ESP32 when I earlier had used ESP-NOW, and was reverting back to normal wifi for a test).
I'm trying to figure out how to factory reset the modem, I think the command is AT&F.

See logfile:
Logfile.txt

Hi @Moskus, thanks for the additional details. I am a bit lost here, tbh. It seems to me that the problem might be the SIMCOM module, or how my library uses it. The "ERROR" message is thrown when calling several commands, so i think the modem might get stuck somewhere and then it starts failing repeatedly.

Some comments to help debugging:

  • do you need to post every other second? can you post N messages every 2N seconds? i do not know if this causes issues with the modem's memory
  • you seem to be using tasks. I have not used this library with tasks before, so I am not sure if there could be race conditions or other issues, but as long as only one tasks deals with the modem serial communications, it should be fine.
  • does the problem go away if you reset (or power-off) the modem ?
  • Yes, we need the data to be as live as possible. We'd prefer every second, but the modem is too slow for that.
  • Yes, only one task is communicating with the modem. I learned my lesson a couple of months ago.
  • Reset does "fix" the issue, but I'm not certain in what way, and it might come back rather quickly.

However:
I've had some discussions with the carrier and one problem (among many, peharps) might be that the connection sometime drops down to 2G, which is using GPRS speeds. Is there a way to force the connection to be LTE/4G only?

I'm looking through the code, but can't find it, but I might not be looking hard enough...