[BUG] Timeliness monitor fails under repeated start/stop
Closed this issue · 7 comments
Version information
- RDK: v0.8
- Robot software: v2.10.5
- OS: Ubuntu 20.04 - x86_64
Describe the bug
Timeliness monitor cut off the RDK connection under repeated mode switching.
Steps to reproduce
Compile and run the following minimum reproducible example (make sure your robot arm is not in a singularity, and modify the hard coded robot IP and local IP):
#include <flexiv/Robot.hpp>
#include <flexiv/Exception.hpp>
#include <flexiv/Log.hpp>
#include <flexiv/Scheduler.hpp>
#include <flexiv/Utility.hpp>
#include <flexiv/Model.hpp>
#include <thread>
#include <iostream>
void periodicTask(flexiv::Robot& robot, flexiv::Scheduler& scheduler, flexiv::Log& log,
const std::vector<double>& initTcpPose)
{
static int i = 0;
static int N = 1000; // It works if N = 5000(5s)
if (i % N == 0)
robot.setMode(flexiv::Mode::RT_CARTESIAN_MOTION_FORCE_BASE);
else if(i % N > 0 && i % N < N-1)
robot.streamCartesianMotionForce(initTcpPose);
else
robot.stop();
i++;
if (i>10000)
{
robot.stop();
scheduler.stop();
}
}
int main(int argc, char* argv[])
{
// Program Setup
// =============================================================================================
// Logger for printing message with timestamp and coloring
flexiv::Log log;
// IP of the robot server
std::string robotIP = "192.168.2.100";
// IP of the workstation C running this program
std::string localIP = "192.168.2.109";
try {
// RDK Initialization
// =========================================================================================
// Instantiate robot interface
flexiv::Robot robot(robotIP, localIP);
// Create data struct for storing robot states
flexiv::RobotStates robotStates;
// Clear fault on robot server if any
if (robot.isFault()) {
log.warn("Fault occurred on robot server, trying to clear ...");
// Try to clear the fault
robot.clearFault();
std::this_thread::sleep_for(std::chrono::seconds(2));
// Check again
if (robot.isFault()) {
log.error("Fault cannot be cleared, exiting ...");
return 1;
}
log.info("Fault on robot server is cleared");
}
// Enable the robot, make sure the E-stop is released before enabling
log.info("Enabling robot ...");
robot.enable();
// Wait for the robot to become operational
int secondsWaited = 0;
while (!robot.isOperational()) {
std::this_thread::sleep_for(std::chrono::seconds(1));
if (++secondsWaited == 10) {
log.warn(
"Still waiting for robot to become operational, please check that the robot 1) "
"has no fault, 2) is in [Auto (remote)] mode");
}
}
log.info("Robot is now operational");
//Test the scheduler
// Set initial TCP pose
robot.getRobotStates(robotStates);
auto initTcpPose = robotStates.tcpPose;
log.info("Initial TCP pose set to [position 3x1, rotation (quaternion) 4x1]: "
+ flexiv::utility::vec2Str(initTcpPose));
flexiv::Scheduler scheduler;
scheduler.addTask(std::bind(periodicTask, std::ref(robot),
std::ref(scheduler), std::ref(log),
std::ref(initTcpPose)),
"HP", 1, scheduler.maxPriority());
scheduler.start();
}
catch (const flexiv::Exception& e)
{
return 1;
}
return 0;
}
The above code does the following:
- Switch to RT_CARTESIAN_MOTION_FORCE_BASE mode.
- Stream command normally for 1 second. (if changed to execute 5 seconds, then no issue)
- Stop robot, and goto step 1.
Expected behavior
The program work as usual.
Screenshots
[2023-10-11 13:54:59.296] [info] ================================================
[2023-10-11 13:54:59.296] [info] === Flexiv RDK 0.8 ===
[2023-10-11 13:54:59.296] [info] ================================================
[2023-10-11 13:54:59.297] [info] [flexiv::Robot] Network service started [RT:99]
[2023-10-11 13:54:59.797] [info] [flexiv::Robot] Connection established
[2023-10-11 13:54:59.799] [info] [flexiv::Robot] Server information:
[2023-10-11 13:54:59.799] [info] Serial number: Rizon 4s-062060
[2023-10-11 13:54:59.799] [info] Degrees of freedom: 7
[2023-10-11 13:54:59.799] [info] Software version: v2.10.5
[2023-10-11 13:54:59.802] [info] [flexiv::Robot] Initialization complete
[2023-10-11 13:54:59.802] [info] Enabling robot ...
[2023-10-11 13:54:59.827] [info] Robot is now operational
[2023-10-11 13:54:59.827] [info] Initial TCP pose set to [position 3x1, rotation (quaternion) 4x1]: 0.651 -0.165 0.461 0.091 0.022 0.995 0.030
[2023-10-11 13:54:59.842] [info] [flexiv::Scheduler] Initialization complete
[2023-10-11 13:54:59.842] [info] [flexiv::Scheduler] Maximum available priority for the user task is [97]
[2023-10-11 13:54:59.842] [info] [flexiv::Scheduler] Added task [HP]: interval [1 ms], priority [97], CPU affinity [-1]
[2023-10-11 13:54:59.842] [info] [flexiv::Scheduler] Starting all tasks
[2023-10-11 13:55:01.847] [info] [flexiv::Robot] Operation mode transited to [RT_CARTESIAN_MOTION_FORCE_BASE]
[2023-10-11 13:55:02.853] [info] [flexiv::Robot] Operation mode transited to [IDLE]
[2023-10-11 13:55:03.693] [warning] [Timeliness Monitor] 10 missed messages in 5s
[2023-10-11 13:55:03.702] [warning] [Timeliness Monitor] 20 missed messages in 5s, RDK connection is about to close
Additional context
NA
And by modifying the periodicTask to
void periodicTask(flexiv::Robot& robot, flexiv::Scheduler& scheduler, flexiv::Log& log,
const std::vector<double>& initTcpPose)
{
static int i = 0;
static int N = 8000; // It works if N = 5000(5s)
if (i % N == 0)
{
log.info("Start");
robot.setMode(flexiv::Mode::RT_CARTESIAN_MOTION_FORCE_BASE);
}
else if(i % N > 0 && i % N < N-1)
{
if (i%100 == 0)
log.info("Stream " + std::to_string(i));
robot.streamCartesianMotionForce(initTcpPose);
}
else
{
log.info("-----------STOP------------");
robot.stop();
}
i++;
if (i>10000)
{
robot.stop();
scheduler.stop();
}
}
We noticed that for the several hundreds of cycles of periodicTask after the robot.setMode(flexiv::Mode::RT_CARTESIAN_MOTION_FORCE_BASE);
call, the actual frequency is no-longer at 1000Hz, but at a frequency much higher than 1000 Hz.
@acf986 I modified your example to make sure the commands are streamed for 1s instead of 0.999s in your original example, but I still observed the timeliness monitor error. If I increased the N to 2000 (2s), then no issue. @pzhu-flexiv will check on this issue.
void periodicTask(flexiv::Robot& robot, flexiv::Scheduler& scheduler, flexiv::Log& log,
const std::vector<double>& initTcpPose)
{
static int i = 0;
static int N = 1000;
if (i % (N + 2) == 0) {
robot.setMode(flexiv::Mode::RT_CARTESIAN_MOTION_FORCE_BASE);
} else if (i % (N + 2) == (N + 1)) {
robot.stop();
} else {
robot.streamCartesianMotionForce(initTcpPose);
}
i++;
if (i > 10000) {
robot.stop();
scheduler.stop();
}
}
@munseng-flexiv May I know is your testing computer patched with a realtime kernel?
@munseng-flexiv May I know is your testing computer patched with a realtime kernel?
My workstation is Ubuntu 20.04 and is not patched with the real-time kernel. It might be the reason for the bug due to less stable whole-loop latency.
@acf986 Using the very original code snippet your provided without any modification, I can't replicate the issue on RDK v0.9 with a real robot running robot software v2.10.7. I think the code cleanup and logic robustness improvements in RDK v0.9 has already fixed this bug.
May I know when will v0.9 be formally released?
May I know when will v0.9 be formally released?
The software package is at the last stage of QC, it should be available by the end of next week.