flexivrobotics/flexiv_rdk

[BUG] Timeliness monitor fails under repeated start/stop

Closed this issue · 7 comments

acf986 commented

Version information

  • RDK: v0.8
  • Robot software: v2.10.5
  • OS: Ubuntu 20.04 - x86_64

Describe the bug
Timeliness monitor cut off the RDK connection under repeated mode switching.

Steps to reproduce
Compile and run the following minimum reproducible example (make sure your robot arm is not in a singularity, and modify the hard coded robot IP and local IP):

#include <flexiv/Robot.hpp>
#include <flexiv/Exception.hpp>
#include <flexiv/Log.hpp>
#include <flexiv/Scheduler.hpp>
#include <flexiv/Utility.hpp>
#include <flexiv/Model.hpp>
#include <thread>
#include <iostream>

void periodicTask(flexiv::Robot& robot, flexiv::Scheduler& scheduler, flexiv::Log& log,
                  const std::vector<double>& initTcpPose)
{
  static int i = 0;
  static int N = 1000; // It works if N = 5000(5s)
  if (i % N == 0)
    robot.setMode(flexiv::Mode::RT_CARTESIAN_MOTION_FORCE_BASE);
  else if(i % N > 0 && i % N < N-1)
    robot.streamCartesianMotionForce(initTcpPose);
  else
    robot.stop();

  i++;

  if (i>10000)
  {
    robot.stop();
    scheduler.stop();
  }
}

int main(int argc, char* argv[])
{

  // Program Setup
  // =============================================================================================
  // Logger for printing message with timestamp and coloring
  flexiv::Log log;

  // IP of the robot server
  std::string robotIP = "192.168.2.100";
  // IP of the workstation C running this program
  std::string localIP = "192.168.2.109";

  try {
    // RDK Initialization
    // =========================================================================================
    // Instantiate robot interface
    flexiv::Robot robot(robotIP, localIP);

    // Create data struct for storing robot states
    flexiv::RobotStates robotStates;

    // Clear fault on robot server if any
    if (robot.isFault()) {
      log.warn("Fault occurred on robot server, trying to clear ...");
      // Try to clear the fault
      robot.clearFault();
      std::this_thread::sleep_for(std::chrono::seconds(2));
      // Check again
      if (robot.isFault()) {
        log.error("Fault cannot be cleared, exiting ...");
        return 1;
      }
      log.info("Fault on robot server is cleared");
    }

    // Enable the robot, make sure the E-stop is released before enabling
    log.info("Enabling robot ...");
    robot.enable();

    // Wait for the robot to become operational
    int secondsWaited = 0;
    while (!robot.isOperational()) {
      std::this_thread::sleep_for(std::chrono::seconds(1));
      if (++secondsWaited == 10) {
        log.warn(
              "Still waiting for robot to become operational, please check that the robot 1) "
              "has no fault, 2) is in [Auto (remote)] mode");
      }
    }
    log.info("Robot is now operational");

    //Test the scheduler
    // Set initial TCP pose
    robot.getRobotStates(robotStates);
    auto initTcpPose = robotStates.tcpPose;
    log.info("Initial TCP pose set to [position 3x1, rotation (quaternion) 4x1]: "
             + flexiv::utility::vec2Str(initTcpPose));

    flexiv::Scheduler scheduler;
    scheduler.addTask(std::bind(periodicTask, std::ref(robot),
                                std::ref(scheduler), std::ref(log),
                                std::ref(initTcpPose)),
                      "HP", 1, scheduler.maxPriority());
    scheduler.start();


  }
  catch (const flexiv::Exception& e)
  {
    return 1;
  }
  return 0;
}

The above code does the following:

  1. Switch to RT_CARTESIAN_MOTION_FORCE_BASE mode.
  2. Stream command normally for 1 second. (if changed to execute 5 seconds, then no issue)
  3. Stop robot, and goto step 1.

Expected behavior
The program work as usual.

Screenshots

[2023-10-11 13:54:59.296] [info] ================================================
[2023-10-11 13:54:59.296] [info] ===              Flexiv RDK 0.8              ===
[2023-10-11 13:54:59.296] [info] ================================================
[2023-10-11 13:54:59.297] [info] [flexiv::Robot] Network service started [RT:99]
[2023-10-11 13:54:59.797] [info] [flexiv::Robot] Connection established
[2023-10-11 13:54:59.799] [info] [flexiv::Robot] Server information:
[2023-10-11 13:54:59.799] [info] Serial number: Rizon 4s-062060
[2023-10-11 13:54:59.799] [info] Degrees of freedom: 7
[2023-10-11 13:54:59.799] [info] Software version: v2.10.5
[2023-10-11 13:54:59.802] [info] [flexiv::Robot] Initialization complete
[2023-10-11 13:54:59.802] [info] Enabling robot ...
[2023-10-11 13:54:59.827] [info] Robot is now operational
[2023-10-11 13:54:59.827] [info] Initial TCP pose set to [position 3x1, rotation (quaternion) 4x1]: 0.651 -0.165 0.461 0.091 0.022 0.995 0.030 
[2023-10-11 13:54:59.842] [info] [flexiv::Scheduler] Initialization complete
[2023-10-11 13:54:59.842] [info] [flexiv::Scheduler] Maximum available priority for the user task is [97]
[2023-10-11 13:54:59.842] [info] [flexiv::Scheduler] Added task [HP]: interval [1 ms], priority [97], CPU affinity [-1]
[2023-10-11 13:54:59.842] [info] [flexiv::Scheduler] Starting all tasks
[2023-10-11 13:55:01.847] [info] [flexiv::Robot] Operation mode transited to [RT_CARTESIAN_MOTION_FORCE_BASE]
[2023-10-11 13:55:02.853] [info] [flexiv::Robot] Operation mode transited to [IDLE]
[2023-10-11 13:55:03.693] [warning] [Timeliness Monitor] 10 missed messages in 5s
[2023-10-11 13:55:03.702] [warning] [Timeliness Monitor] 20 missed messages in 5s, RDK connection is about to close

Additional context
NA

acf986 commented

And by modifying the periodicTask to

void periodicTask(flexiv::Robot& robot, flexiv::Scheduler& scheduler, flexiv::Log& log,
                  const std::vector<double>& initTcpPose)
{
  static int i = 0;
  static int N = 8000; // It works if N = 5000(5s)
  if (i % N == 0)
  {
    log.info("Start");
    robot.setMode(flexiv::Mode::RT_CARTESIAN_MOTION_FORCE_BASE);
  }
  else if(i % N > 0 && i % N < N-1)
  {
    if (i%100 == 0)
      log.info("Stream " + std::to_string(i));

    robot.streamCartesianMotionForce(initTcpPose);
  }
  else
  {
    log.info("-----------STOP------------");
    robot.stop();
  }

  i++;

  if (i>10000)
  {
    robot.stop();
    scheduler.stop();
  }
}

We noticed that for the several hundreds of cycles of periodicTask after the robot.setMode(flexiv::Mode::RT_CARTESIAN_MOTION_FORCE_BASE); call, the actual frequency is no-longer at 1000Hz, but at a frequency much higher than 1000 Hz.

@acf986 I modified your example to make sure the commands are streamed for 1s instead of 0.999s in your original example, but I still observed the timeliness monitor error. If I increased the N to 2000 (2s), then no issue. @pzhu-flexiv will check on this issue.

void periodicTask(flexiv::Robot& robot, flexiv::Scheduler& scheduler, flexiv::Log& log,
    const std::vector<double>& initTcpPose)
{
    static int i = 0;
    static int N = 1000;
    if (i % (N + 2) == 0) {
        robot.setMode(flexiv::Mode::RT_CARTESIAN_MOTION_FORCE_BASE);
    } else if (i  % (N + 2) == (N + 1)) {
        robot.stop();
    } else {
        robot.streamCartesianMotionForce(initTcpPose);
    }

    i++;

    if (i > 10000) {
        robot.stop();
        scheduler.stop();
    }
}
acf986 commented

@munseng-flexiv May I know is your testing computer patched with a realtime kernel?

@munseng-flexiv May I know is your testing computer patched with a realtime kernel?

My workstation is Ubuntu 20.04 and is not patched with the real-time kernel. It might be the reason for the bug due to less stable whole-loop latency.

@acf986 Using the very original code snippet your provided without any modification, I can't replicate the issue on RDK v0.9 with a real robot running robot software v2.10.7. I think the code cleanup and logic robustness improvements in RDK v0.9 has already fixed this bug.
image

acf986 commented

May I know when will v0.9 be formally released?

May I know when will v0.9 be formally released?

The software package is at the last stage of QC, it should be available by the end of next week.