Yaskawa-Global/motoros2

Provide human-readable definitions/alarms for RCL(C) initialisation errors

ted-miller opened this issue · 9 comments

@ted-miller: we might want to consider reporting some of these RCL(C) initialisation errors, or defining some subcodes for them.

Especially the NODE, PUBLISHER, SUBSCRIPTION, etc categories seem to be caused by things which could often be solved by users without needing to post on the tracker here.

Originally posted by @gavanderhoorn in #125 (reply in thread)

Looking into this a bit: this wouldn't be too much work to implement, but I'm trying to figure out a good place for these kinds of "RCL(C) initialisation errors/alarms" in ErrorHandling.h.

We don't really have a main alarm category for those I believe:

//===================================
//Main Codes
//===================================
typedef enum
{
ALARM_TASK_CREATE_FAIL = 8010,
ALARM_ASSERTION_FAIL,
ALARM_ALLOCATION_FAIL,
ALARM_CONFIGURATION_FAIL,
ALARM_INFORM_JOB_FAIL,
ALARM_DAT_FILE_PARSE_FAIL,
} ALARM_MAIN_CODE;

The issue title already refers to these kinds of problems as "initialisation errors", so it doesn't seem like they would fit under ALARM_CONFIGURATION_FAIL or ALARM_TASK_CREATE_FAIL. And we only use ALARM_ASSERTION_FAIL for actual assertions, which don't get much text on the PP.

do we still have 'space' for a category dedicated to RCLC_INIT errors/alarms?

We could then raise a regular alarm and provide some descriptive text as part of it.


Edit: we could also make it a 'micro-ROS' category.

As to subcodes: initially I thought we could perhaps opt to just 'forward' the various RCL_RET_* defined codes.

However, there is no guarantee those values will never change and that would mean we'd have to keep track of that and update our documentation if that happens.

Two approaches I've come up with:

  1. define a single subcode, reuse one of the alarm categories, and include the RCL(C) error code as part of the message text

  2. define (something like) an ALARM_RCL_RCLC_FAIL category, then (re)define relevant (perhaps all?) RCL_RET_* as subcodes (in a new typedef enum). RCL(C) API failures could then be posted to the PP:

    mpSetAlarm(ALARM_RCL_RCLC_FAIL, "Invalid node name",
        SUBCODE_RCL_RCLC_FAIL_NODE_INVALID_NAME);

the former would be trivial to implement (although we need to figure out which category to use).

The latter would be a little more work, but would make RCL(C) errors more 'explicit' and we could document each of them individually.

I first preferred the second approach, but thinking about it, the former is less work, gets us what we're looking for (a way to document at least some of the RCL(C) initialisation errors/return codes when they could potentially be solved by the user) and wouldn't require adding 30 new subcodes and a new alarm category.

We could still document individual values (ie: 202) in the troubleshooting guide. The entry point would just be a single alarm + subcode, and the text would clarify which RCL(C) error it really is.

I'm in favor of the second option. I think that the purpose of raising these new alarms is to make it easier for the user the debug. In that regard, providing an explicit definition of the error (as opposed to some number that must be looked up) is much more user-friendly.

It would mean adding quite a few new subcodes, with a lot of documentation to write as well.

I'm going to postpone this to after 0.2.0, which would focus on updating the underlying infrastructure (newer Humble packages, Iron compatibility).

It would mean adding quite a few new subcodes, with a lot of documentation to write as well.

An additional consideration: IIUC, subcodes are limited to a UCHAR, which would mean a maximum of 255.

I realise there are only so many rcl(c) errors we might want to directly map, but they'd still be taking up valuable 'space'.

I just had another occurrence of this and wasted a good amount of time trying to remember how to track down the rcl error codes.

This shouldn't be difficult to implement, so I retargeted the next milestone.

Should this be 0.2.1 instead?

0.2.0 was really intended to be just about updating libmicroros and adding Iron support.

Sure. I just want to get it on the todo list.