Help with grpc error handling and time to receive device update event

Question

Help with grpc error handling and time to receive device update event

ipzago opened this issue a year ago · 5 comments

ipzago commented a year ago

Hello,

We are trying to improve our PLGD usage and some questions appeared. Could you help us?

Talking about error, is there any place that we could consult possible errors returned from PLGD?
We're trying to evaluate better the errors returned from grpc calls (get, update, create and delete)
and for this we need to know possible errors and the best way to identify them without need to compare strings.
Should grpc attempt to update a read only resource return error? We were expecting it but no error is received.
E.g.: We tried to perform grpc update of module's /cfg access identifier and no error was received, just the
resource payload with no changes.
Sometimes, it takes seconds to receive events updating device metadata or operations of unregister/register.
Is there any timer in PLGD os DPS to check/send updates?
E.g.: In our tests, we pause or disconnect module container from network and wait for system reaction, and sometimes
it takes more than expected.

Best Regards,
Icaro

Answer 1 · 2023-10-25T11:59:26.000Z

Talking about error, is there any place that we could consult possible errors returned from PLGD?
We're trying to evaluate better the errors returned from grpc calls (get, update, create and delete)
and for this we need to know possible errors and the best way to identify them without need to compare strings.

The errors code that could be returned are defined by grpc codes and our extended codes. The device for the action is specified in data.status, link. Also data.status is used to set proper code for the grpc.

Should grpc attempt to update a read only resource return error? We were expecting it but no error is received.
E.g.: We tried to perform grpc update of module's /cfg access identifier and no error was received, just the
resource payload with no changes.

Plgd doesn't know if a resource is read-only, so the request is sent to the device, and the device needs to return an error code. Additionally, the body can contain data that describes the error. In this case, the error code will be set in data.status of the body in the gRPC response, and the body in data.content.data will be encoded mostly in CBOR.

Sometimes, it takes seconds to receive events updating device metadata or operations of unregister/register.
Is there any timer in PLGD os DPS to check/send updates?
E.g.: In our tests, we pause or disconnect module container from network and wait for system reaction, and sometimes
it takes more than expected.

This depends on various factors and use cases:

When a module is able to retrieve configurations from DPS, it goes through intervals in an infinite loop. In the default worst case scenario, it could take up to 130 seconds (configurable).
The configured keepalive in the CoAP gateway determines the handling of OFFLINE events when a device has been disconnected.
The configuration of the heartbeat in the CoAP gateway is relevant for handling SIGKILL events in the CoAP gateway.
The duration during which the module has been unable to reach the hub is a factor. The cloud connector will iterate in seconds intervals in an infinite loop, with a default worst-case scenario of 66 seconds (configurable).

Answer 2 · 2023-10-25T13:32:28.000Z

Hey jkralik, many thanks for the answer.
Could you please just mention which variables we should set using PLGD bundle on docker?

Answer 3 · 2023-10-25T16:05:33.000Z

@ipzago To use ghcr.io/plgd-dev/hub/bundle:2.12.1, you need to perform two runs with a mounted volume to the directory /data:

The first run generates configurations to the volume. Use the following command and then stop it:
```
docker run -it --rm -v /tmp/bundle_data:/data ghcr.io/plgd-dev/hub/bundle:2.12.1
```
Modify the configurations in /tmp/bundle_data. For example, you can edit /tmp/bundle_data/coap-gateway-secure.yaml by changing values like apis.coap.keepAlive.timeout or serviceHeartbeat.timeToLive to 10s.

In the second run, use the configurations from the module with the same command:

docker run -d --name plgd-bundle -v /tmp/bundle_data:/data ghcr.io/plgd-dev/hub/bundle:2.12.1

Answer 4 · 2023-11-29T11:09:04.000Z

@jkralik I performed some local tests changing both values to 10s and 5s and it seems to make no difference. I mounted the volume as indicated and confirmed that file was with right values on /data. Is there any other config that we could try?

Answer 5 · 2023-11-29T12:08:25.000Z

@ipzago Pls could you look to coap-gw logs in the file /data/log/coap-gateway.log?

There will be one INFO log (mostly second line) similar to {"L":"INFO","T":"2023-11-16T08:19:25.93088978Z","M":"config: .... You can then verify if the values are loaded as configured.