joelguittet/mender-mcu-client

Failure to check for deployment when http response is chunked

Closed this issue ยท 10 comments

Part: ESP32-S3-WROOM-1
ESP-IDF: 5.2.0

Hi there ๐Ÿ‘‹ I am attempting to integrate this client into an ESP32-S3 based design. However I am having issues getting it all to work. I am able to get the device added into my Mender account and successfully authenticated, however it always fails when attempting to check for deployments.

I (5776) info: Connected to AP!
I (6291) mender: ./components/mender-mcu-client/mender-mcu-client/platform/board/esp-idf/src/mender-storage.c (169): OTA ID or artifact name not available
I (6289) mender_ota: Mender inventory initialized
--------------------------------------------------------
Task Name       | Stack High Water Mark
--------------------------------------------------------
         mender | 10896 bytes
Keyboard Manage | 14928 bytes
   WiFi Manager | 10528 bytes
        Tmr Svc | 4640 bytes
           IDLE | 2800 bytes
           IDLE | 2752 bytes
            tiT | 6544 bytes
    USB Manager | 20048 bytes
  Power Manager | 16880 bytes
           ipc0 | 1984 bytes
     UI Manager | 252544 bytes
      esp_timer | 12384 bytes
           ipc1 | 1968 bytes
           wifi | 16800 bytes
        sys_evt | 6288 bytes
        TinyUSB | 13088 bytes
--------------------------------------------------------
Free Heap Size: 45687 bytes
Minimum Ever Free Heap Size: 43255 bytes
--------------------------------------------------------
I (6512) info: OTA service started.
E (9128) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-api.c (652): rcvd 0
E (9129) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-api.c (652): rcvd 512
I (9141) mender_ota: Mender client authenticated
E (9143) mender: ./components/mender-mcu-client/mender-mcu-client/platform/board/esp-idf/src/mender-ota.c (168): Unable to get running version state
I (10138) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-client.c (443): Checking for deployment...
E (11093) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-api.c (293): Invalid response
E (11093) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-client.c (445): Unable to check for deployment

I added a bit of debug code within mender-api.c trying to determine if I was just getting a 0 length response (Placed on line 651)

Line 651: mender_log_error("rcvd %d", response_length);

My config related to the client is as follows (which I believe is basically default aside from setting my server URL (obviously not FOO))

#
# Mender client Configuration
#
CONFIG_MENDER_SERVER_HOST="FOO"
CONFIG_MENDER_SERVER_TENANT_TOKEN=""
CONFIG_MENDER_CLIENT_AUTHENTICATION_POLL_INTERVAL=600
CONFIG_MENDER_CLIENT_UPDATE_POLL_INTERVAL=1800
# CONFIG_MENDER_CLIENT_ADD_ON_CONFIGURE is not set
CONFIG_MENDER_CLIENT_ADD_ON_INVENTORY=y
CONFIG_MENDER_CLIENT_INVENTORY_REFRESH_INTERVAL=28800
# end of Mender client Configuration

#
# Mender advanced Configuration
#
CONFIG_MENDER_RTOS_WORK_QUEUE_STACK_SIZE=20
CONFIG_MENDER_RTOS_WORK_QUEUE_PRIORITY=5
CONFIG_MENDER_RTOS_WORK_QUEUE_LENGTH=10
# end of Mender advanced Configuration

I have no problem building and executing an OTA using the mender example project for ESP32 that you provide but I'm at a total loss as to what could be the problem. Digging into the error within mender-api.c it appears that the response pointer is still NULL and thus when it's handed to cJSON for parsing the output cJSON object pointer is NULL thus the error.

I attempted to follow the callbacks and I could never even find where the *response variable would ever be allocated. My assumption at this point is a memory problem but I just don't understand where or how to solve it.. Any help is greatly appreciated!

A bit more info. Here is my OTA init

bool ota_init(void) {

    if(mender_client_initialized) {
        log_e("Can't start. Mender client already started!");
        return false;
    }

    // Retrieve the device SN
    memset(device_sn, 0, sizeof(device_sn));
    cfg.getSerialNumber(device_sn);

    sprintf(artifact_name, "alpha-%s-v%d.%d.%d", BUILD_VARIANT, PROD_VERSION_MAJOR, PROD_VERSION_MINOR, PROD_VERSION_PATCH);
    sprintf(device_type, "%s", ALPHA_CFG_DEVICE_TYPE);

    mender_client_config_t    mender_client_config    = { .mac_address                  = device_sn,
                                                    .artifact_name                = artifact_name,
                                                    .device_type                  = device_type,
                                                    .host                         = NULL,
                                                    .tenant_token                 = NULL,
                                                    .authentication_poll_interval = 0,
                                                    .update_poll_interval         = 0,
                                                    .recommissioning              = false };
    mender_client_callbacks_t mender_client_callbacks = { .authentication_success = authentication_success_cb,
                                                          .authentication_failure = authentication_failure_cb,
                                                          .deployment_status      = deployment_status_cb,
                                                          .ota_begin              = mender_ota_begin,
                                                          .ota_write              = mender_ota_write,
                                                          .ota_abort              = mender_ota_abort,
                                                          .ota_end                = mender_ota_end,
                                                          .ota_set_boot_partition = mender_ota_set_boot_partition,
                                                          .restart                = restart_cb };

    if(MENDER_OK == mender_client_init(&mender_client_config, &mender_client_callbacks)) {
        mender_client_initialized = true;

#ifdef CONFIG_MENDER_CLIENT_ADD_ON_INVENTORY
        mender_inventory_config_t mender_inventory_config = { .refresh_interval = 0 };
        ESP_ERROR_CHECK(mender_inventory_init(&mender_inventory_config));
        ESP_LOGI(TAG, "Mender inventory initialized");
#endif /* CONFIG_MENDER_CLIENT_ADD_ON_INVENTORY */
    }

    return mender_client_initialized;
}

All of the callbacks are stock and match the example project.

I retried using the example client app again and I get the same failure. We are using a self-hosted version of Mender and we previously had an issues with a body size limit for the API gateway and I'm suspecting there's a similar problem occuring.

To clarify, the Checking for deployment... succeeds if there is no deployment available.

I (9908) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-client.c (443): Checking for deployment...
I (10128) esp-x509-crt-bundle: Certificate validated
I (10858) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-client.c (451): No deployment available

The problem only occurs when there is a deployment available. Still happy to hear your thoughts @joelguittet

I went and created a trial account at Mender to test using the hosted enterprise server. The client in my firmware connects right away and applies the OTA ๐Ÿ˜“

@joelguittet is there any reason this client shouldn't work with the Open Source / Self-hosted version of the server?

I've been working with our backend dev and come to the conclusion that our nginx proxy server sitting in between the client and the Mender servier is trying to send HTTP responses in chunks whereas the client expects one large response that we then read 512 bytes at a time from. Closing.

Hello @stephendpmurphy

I'm reading the whole messages at once, what an eventful story !!
Sorry for not being able to answer quickly, it seems you understood where the issue comes from. I let you reopened in case it's necessary.

By the way, if you think the nginx configuration you have should be supported by the mender-mcu-client because it is used largely, then I'm happy to study/review/accept/do some modifications.
On my side I already checked the OTA using a private mender server instance following https://docs.mender.io/server-installation, it was working as expected, but not checked recently (I will in a near future using the CI/CD).

Hope you have fun with my library!

Regards
Joel

@joelguittet Yeah sorry I was all over the place and up against a deadline. I was trying to give as much details as possible in case you were to see it in time. ๐Ÿ˜…

Regarding this being an actual use-case.. Maybe? We use the nginx as a proxy server in front of multiple other services which is why our back-end team did it. There is configuration to turn off "chunked" responses which is the solution in this case (just a matter of getting it to work). I enabled debug logs for the HTTP_CLIENT from ESP so I could view sent and received headers. I'll provide the returned headers from the NGINX proxy and the Mender hosted enterprise server.

The key things to note is that the proxy does not send the HEADER=Content-Length: header, and instead has a HEADER=Transfer-Encoding:chunked header.

Versus the Mender enterprise server just sending a HEADER=Content-Length:856 header indicating the total response length to the client and allowing it to do it's bite size reads of 512 bytes.

The solution I suppose here is if a Content-Length header isn't provided then to look for the Transfer-Encoding:chunked header in which you would have to add a new method for reading the response (which I don't know if the ESP HTTP client even supports).

Request & Response headers to/from NGINX gateway proxy

I (7846) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-client.c (443): Checking for deployment...
D (7856) HTTP_CLIENT: Begin connect to: [{REDACTED}]({REDACTED})
D (8568) HTTP_CLIENT: Write header[4]: GET /api/devices/v1/deployments/device/deployments/next?artifact_name=alpha-dbg-v0.4.0&device_type=alpha-gen1 HTTP/1.1
User-Agent: ESP32 HTTP Client/1.0
Host: {REDACTED}
Authorization: Bearer {REDACTED}
Content-Length: 0


D (8941) HTTP_CLIENT: on_message_begin
D (8942) HTTP_CLIENT: HEADER=Server:nginx/1.25.1
D (8943) HTTP_CLIENT: HEADER=Date:Fri, 25 Aug 2023 19:44:00 GMT
D (8945) HTTP_CLIENT: HEADER=Content-Type:application/json; charset=utf-8
D (8952) HTTP_CLIENT: HEADER=Transfer-Encoding:chunked
D (8957) HTTP_CLIENT: HEADER=Connection:keep-alive
D (8962) HTTP_CLIENT: HEADER=Referrer-Policy:no-referrer
D (8968) HTTP_CLIENT: HEADER=Strict-Transport-Security:max-age=31536000; includeSubDomains; preload
D (8977) HTTP_CLIENT: HEADER=Vary:Accept-Encoding
D (8982) HTTP_CLIENT: HEADER=X-Content-Type-Options:nosniff
D (8987) HTTP_CLIENT: HEADER=X-Men-Requestid:d23521f2-01e9-41e9-9e0a-0d0ebc8687e3
D (8995) HTTP_CLIENT: HEADER=X-Xss-Protection:1; mode=block
D (9000) HTTP_CLIENT: http_on_headers_complete, status=200, offset=424, nread=424
D (9008) HTTP_CLIENT: http_on_chunk_header, chunk_length
D (9013) HTTP_CLIENT: http_on_body 83
D (9017) HTTP_CLIENT: Body received in fetch header state, 0x3fcf5dfd, 83
D (9024) HTTP_CLIENT: content_length = -1

Request & Response headers to/from Mender hosted enterprise server.

I (6274) mender: ./components/mender-mcu-client/mender-mcu-client/core/src/mender-client.c (443): Checking for deployment...
D (6283) HTTP_CLIENT: Begin connect to: [https://hosted.mender.io:443](https://hosted.mender.io/)
D (6793) HTTP_CLIENT: Write header[4]: GET /api/devices/v1/deployments/device/deployments/next?artifact_name=alpha-dbg-v0.4.0&device_type=alpha-gen1 HTTP/1.1
User-Agent: ESP32 HTTP Client/1.0
Host: hosted.mender.io
Authorization: Bearer {REDACTED}
Content-Length: 0


D (7179) HTTP_CLIENT: on_message_begin
D (7180) HTTP_CLIENT: HEADER=Date:Fri, 25 Aug 2023 20:06:09 GMT
D (7180) HTTP_CLIENT: HEADER=Content-Type:application/json; charset=utf-8
D (7185) HTTP_CLIENT: HEADER=Content-Length:856
D (7190) HTTP_CLIENT: HEADER=Connection:keep-alive
D (7195) HTTP_CLIENT: HEADER=Referrer-Policy:no-referrer
D (7200) HTTP_CLIENT: HEADER=Strict-Transport-Security:max-age=31536000; includeSubDomains; preload
D (7209) HTTP_CLIENT: HEADER=X-Content-Type-Options:nosniff
D (7215) HTTP_CLIENT: HEADER=X-Men-Requestid:d45ed259-0a52-4270-8dc2-b5715ed67286
D (7222) HTTP_CLIENT: HEADER=X-Xss-Protection:1; mode=block
D (7228) HTTP_CLIENT: http_on_headers_complete, status=200, offset=372, nread=372
D (7236) HTTP_CLIENT: http_on_body 140
D (7239) HTTP_CLIENT: Body received in fetch header state, 0x3fcf5bfc, 140
D (7246) HTTP_CLIENT: content_length = 856
I (7251) mender: ./components/mender-mcu-client/mender-mcu-client/platform/http/esp-idf/src/mender-http.c (140): Content length is 856
D (7263) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=856
D (7270) HTTP_CLIENT: need_read=372, byte_to_read=372, rlen=372, ridx=140
D (7277) HTTP_CLIENT: http_on_body 372
I (7281) mender: ./components/mender-mcu-client/mender-mcu-client/platform/http/esp-idf/src/mender-http.c (146): Read 512 bytes
D (7293) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=856
D (7300) HTTP_CLIENT: need_read=512, byte_to_read=512, rlen=344, ridx=0
D (7306) HTTP_CLIENT: http_on_body 344
D (7310) HTTP_CLIENT: http_on_message_complete, parser=0x3fcf79cc
D (7316) HTTP_CLIENT: is_data_remain=0, is_chunked=0, content_length=856

Thanks for the details.
I m not sure how to reproduce that, and if any solution is available on the device side, but I have modified the management of the content length due to some other issues at the beginning. Can you check if reverting commit 70101c4 can help in your situation?

I would agree. As the Client sits today it can't handle these chunked responses from the Proxy and would need to be updated to be a bit smarter about reading the headers and determining how to handle the response. I can try reverting but I believe the ESP API is actually reporting back a content-length of -1 so I'm not sure any changes in the client will change the behavior. We are still working on getting the proxy config to work but if that doesn't work you will likely see a PR soon adding chunked response handling to the HTTP request API in this client.

I debugged it a bit further and here's what I found:

  1. Sometimes response from server (proxy or mender's api-gateway) using HTTP/1.1 protocol gets chucked. I don't know how hosted version works but my guess is they cache responses aggressively, and therefore they know the length of json in response and return Content-Length to client. The responses from local mender instance most likely work because even if response is technically chunked it fits into a single tcp frame and gets parsed by esp http client fully in one shot. This is not the case for our mender configuration because we use AWS S3 bucket without public access instead of MinIO and artifact uris are very long (because they contain signature as query string that allows pulling files from private bucket).
  2. In case of chunked response esp_http_client_fetch_headers returns zero as content_length: espressif/esp-idf/components/esp_http_client/esp_http_client.c#L1338-L1341. Before returning 0 it logs content_length=-1 and that's what we see in logs.
  3. The mender-mcu-client in its turn uses returned content_length as final content length of the response: joelguittet/mender-mcu-client/platform/http/esp-idf/src/mender-http.c#L134. However it's not the case when we don't have Content-Length in response headers and response is chucked.
  4. In addition to returning zero, esp_http_client_fetch_headers sets client->response->is_chunked to true.

So, it's not enough to rely on content_length returned from esp_http_client_fetch_headers. If it equals 0 I think we should do something like this (forgive my kind of pseudo-code here, I can't test it right now):

while (!esp_http_client_is_complete_data_received(client)) {
    char data[512];
    int  read_length = esp_http_client_read_response(client, data, sizeof(data));
    if (read_length > 0) {
        /* Transmit data received to the upper layer */
        if (MENDER_OK != (ret = callback(MENDER_HTTP_EVENT_DATA_RECEIVED, data, (size_t)read_length, params))) {
            mender_log_error("An error occurred, stop reading data");
            goto END;
        }
    } else {
        // most likely all is okay but maybe we should check for errors here?
    }
}

esp_http_client_read_response calls esp_http_client_read multiple times under the hood.

I think it would be reproducible locally if we can make mender server to generate very long artifact URIs.

Support for chunk content would be great to permit multiple use case, and yours particularly. I m reopening the issue changing the title.

About the pseudo code, it s look like to reverting the commit I have mentioned, but without fetching the headers, looks not bad probably...

The commit was done to handle an issue (I don't remember the exact detail) with mender configure addon. So to validate we will need to check there is no regression there (I will be able to).

I will be happy to review a PR :-)