ros-navigation/navigation2

Error codes in NavigateToPose/NavigateThroughPoses

Opened this issue · 13 comments

Moving the conversation here to discuss the possibility of adding error codes for NavigateToPose/NavigateThroughPoses actions in a similar way as the other servers (planner, controller, etc.)

I'm thinking of a use case where you call the NavigateToPose action from a high-level system, and I want to know why the robot can't reach the goal from the action client itself. It would be nice to add or populate some error codes from the "subservers" like tf_error, timeout, invalid path, etc.

@SteveMacenski what is your appetite for the interface change of adding a string error_msg to go with the uint16 error_code in the nav2_msgs/action/*.action result definitions?

That would allow a pathway to propagate failure messages with details out to the action client.

For instance in planner_server, the calls to exceptionWarning could either return a string, or take one as a reference and then add that to the result->error_msg.

exceptionWarning(curr_start, curr_goal, goal->planner_id, ex);

Similar for controller server.

result->error_code = Action::Result::INVALID_CONTROLLER;

For backward compatibility would you prefer new actions?

We had a discussion about that when we initially added in the error codes. We decided towards enums so that you weren't trying to parse strings for exact matches for checking which error this string represents programmatically. Having error messages in addition to could be useful, I agree!

It would just be a bit of repetitive work to: add it to each action interface, populate it all in for the various servers when populating its error code (I think you can do e.what() for most of them? I think all the exceptions have to have their string defined when thrown), then the slightly more substantive problem of how to store and serve that on the behavior tree like we have the codes.

Or perhaps don't store / serve on the BTs? The error enums should be the actionable bits for logic. How do you imagine using those messages? I suppose that has an impact about what we do with it at the Navigator and above levels.

For backward compatibility would you prefer new actions?

What would you have in mind? For main (rolling, and depending on how quickly this is done, potentially Jazzy), we should just support the one with these adjustments. For Humble, I suppose we could have dual action definitions but that seems tricky to navigate.

Adding a string error_message in addition to the current int error_code it isn't hard to implement. That would help people who are not familiar with error codes and not have to check what each error code means. And it would be a first step to have the error codes for the navigators actions.

I think storing in the blackboard the error code in enum format will simplify the logic for populate it at higher level. For example, storing the last error code ocurred (from the BT node that fails).

Regarding backward compatibility, there are no error codes in humble so, if this feature is implemented, can't be backported to humble.

How do you imagine using those messages?

My use case for error messages is to be able to bubble up reasons that the behavior tree action aborted to a user interface. I foresee it helping a wider spectrum of people being able to reason about behavior and ultimately lead to them coming up with ideas for behavior tree improvements.

I think, right now, that it is reasonable that the error message strings are not put into the behavior tree blackboard. (Limited experience so I could be easily swayed on this front). I agree that behavior tree logic that responds to errors is best done against against enums, rather than string matching. If the behavior tree is complete/complex enough to handled the errors encountered then the noise of error messages is not necessarily interesting.

On the other hand a dedicated published log of error messages interposed with successful behavior transitions could be very useful for explain-ability. Maybe putting the error message strings into the blackboard would facilitate that. Maybe a ReportError action or ?decorator? could control what messages are deemed important enough to make it into such a log.

For backward compatibility would you prefer new actions?
What would you have in mind?

I agree that dual action definitions like NavigateToPoseWithErrorString/NavigateThroughPosesWithErrorString or NavigateToPose2/NavigateThroughPoses2 would be distateful and either be a naive cut and paste, or possibly use some template logic to detect the presence of and populate an error_message string in an action Result class.

I personally, build from source on humble and track close to main periodically by back porting it to humble myself. I would like to change the existing action result messages to include the error_message string and can accept that this will not be available in humble/iron.

You have a wider audience and hence my check in with your appetite for such a change. My read is that it would not be a back ported change to the interface.

I think, right now, that it is reasonable that the error message strings are not put into the behavior tree blackboard. (Limited experience so I could be easily swayed on this front).

We can always add it later too. We can / probably should follow the same patterns we use for the error IDs just now with messages. How are the messages proping to your User Interface then?

I personally, build from source on humble and track close to main periodically by back porting it to humble myself.

That seems like a reasonable solution for this. Though, if you can add in the error_msg to the interfaces ASAP and open a PR, I can merge that in before we do a Jazzy cut. That would let this feature be ship-able with Jazzy later on since that's the major ABI breaking change we need to be aware of for backporting. Then, a follow up PR to implement can be done on the schedule that makes the most sense for you.

Would you be able to do that ASAP? I'm looking to cut Jazzy today... but I could delay to Monday or Tuesday if that makes a difference.

My read is that it would not be a back ported change to the interface.

Correct, due to API/ABI stability of existing users, can't be breaking people's robots out in the wild

I just saw this. I'll have a go now.

See #4459

Here is a second pull request with a first pass at populating the error_msg, and error_code where it was missing (FollowWaypoints)

See #4460

It builds but is untested and probably incomplete so I left it as draft.
You may have comments on the style.

Added a second pass to use the error_code and error_msg expected in each action in nav2_simple_navigator.

Not run them yet.

@SteveMacenski wrote: a while back.
How are the messages proping to your User Interface then?

Finally figured out that "proping" probably means propagating.

The error messages are currently not propagating to my user interface. Hence I went searching for why, came across this issue and also felt the pain point. With these changes I will be able to get the errors to our user interface.

And now I am at the nav2_bt_navigator/src/navigators level and staring down the catch(...) and wondering if that is my target.

Now I am thinking that ResultStatus needs to be the one carrying the error message.

The error messages are currently not propagating to my user interface. Hence I went searching for why, came across this issue and also felt the pain point. With these changes I will be able to get the errors to our user interface.

This is why I asked how you planned to propagate the error messages through to your interface. You'll see that the individual BT nodes that have actions with error codes populate in the BT blackboard the error codes which are aggregated by the BT Navigator into the action's result message for the highest priority failure number (but realistically there should only be 1 at at time, though possible to have multiple if multiple things go wrong simultaneously).

If you want Nav2Pose and NavThruPoses to have the error message, then it'll need to propogate through the tree into the Navigator's action result callback. If you track down the error_id logic in the BTs, you can see how we did that. It doesn't have anything to do with the feedback exception catching

Now I am thinking that ResultStatus needs to be the one carrying the error message.

That's for the behavior server, wrong place. Example