Help with grpc error handling and time to receive device update event
ipzago opened this issue · 5 comments
Hello,
We are trying to improve our PLGD usage and some questions appeared. Could you help us?
-
Talking about error, is there any place that we could consult possible errors returned from PLGD?
We're trying to evaluate better the errors returned from grpc calls (get, update, create and delete)
and for this we need to know possible errors and the best way to identify them without need to compare strings. -
Should grpc attempt to update a read only resource return error? We were expecting it but no error is received.
E.g.: We tried to perform grpc update of module's /cfg access identifier and no error was received, just the
resource payload with no changes. -
Sometimes, it takes seconds to receive events updating device metadata or operations of unregister/register.
Is there any timer in PLGD os DPS to check/send updates?
E.g.: In our tests, we pause or disconnect module container from network and wait for system reaction, and sometimes
it takes more than expected.
Best Regards,
Icaro
Talking about error, is there any place that we could consult possible errors returned from PLGD?
We're trying to evaluate better the errors returned from grpc calls (get, update, create and delete)
and for this we need to know possible errors and the best way to identify them without need to compare strings.
The errors code that could be returned are defined by grpc codes and our extended codes. The device for the action is specified in data.status
, link. Also data.status
is used to set proper code for the grpc.
Should grpc attempt to update a read only resource return error? We were expecting it but no error is received.
E.g.: We tried to perform grpc update of module's /cfg access identifier and no error was received, just the
resource payload with no changes.
Plgd doesn't know if a resource is read-only, so the request is sent to the device, and the device needs to return an error code. Additionally, the body can contain data that describes the error. In this case, the error code will be set in data.status
of the body in the gRPC response, and the body in data.content.data
will be encoded mostly in CBOR.
Sometimes, it takes seconds to receive events updating device metadata or operations of unregister/register.
Is there any timer in PLGD os DPS to check/send updates?
E.g.: In our tests, we pause or disconnect module container from network and wait for system reaction, and sometimes
it takes more than expected.
This depends on various factors and use cases:
- When a module is able to retrieve configurations from DPS, it goes through intervals in an infinite loop. In the default worst case scenario, it could take up to 130 seconds (configurable).
- The configured keepalive in the CoAP gateway determines the handling of OFFLINE events when a device has been disconnected.
- The configuration of the heartbeat in the CoAP gateway is relevant for handling SIGKILL events in the CoAP gateway.
- The duration during which the module has been unable to reach the hub is a factor. The cloud connector will iterate in seconds intervals in an infinite loop, with a default worst-case scenario of 66 seconds (configurable).
Hey jkralik, many thanks for the answer.
Could you please just mention which variables we should set using PLGD bundle on docker?
@ipzago To use ghcr.io/plgd-dev/hub/bundle:2.12.1
, you need to perform two runs with a mounted volume to the directory /data
:
-
The first run generates configurations to the volume. Use the following command and then stop it:
docker run -it --rm -v /tmp/bundle_data:/data ghcr.io/plgd-dev/hub/bundle:2.12.1
-
Modify the configurations in
/tmp/bundle_data
. For example, you can edit/tmp/bundle_data/coap-gateway-secure.yaml
by changing values likeapis.coap.keepAlive.timeout
orserviceHeartbeat.timeToLive
to10s
. -
In the second run, use the configurations from the module with the same command:
docker run -d --name plgd-bundle -v /tmp/bundle_data:/data ghcr.io/plgd-dev/hub/bundle:2.12.1