ftctechnh/ftc_app

"Restart Robot" abandons watchdog on active OpMode

Windwoes opened this issue · 12 comments

As I'm sure anyone who has ever written an autonomous OpMode knows, if your thread does not exit within 1000ms of a stop being requested, the watchdog growls, and the app force-crashes itself in order to terminate your rogue thread. (This is necessary because it is not possible to kill -9 a single thread).

However, I have found a serious bug in the implementation of the watchdog: it appears that the watchdog of the active OpMode is abandoned when a "Restart Robot" command is issued. This causes the RC device to perform a restart robot as if everything were normal, but the rogue thread continues to run in the background.

Here's a sample OpMode:

@TeleOp
public class RogueLoop extends LinearOpMode
{
    @Override
    public void runOpMode()
    {
        waitForStart();

        while (true) //muahaha Thread.interrupt() ain't gonna do you any good here!
        {
            long iT = System.currentTimeMillis();

            while (System.currentTimeMillis() - iT < 500)
            {
                Thread.yield();
            }

            System.out.println("Sysout from rogue loop");
        }
    }
}

Steps to reproduce:

  1. Deploy the SDK to the RC phone with the above OpMode added to the TeamCode module
  2. Ensure that Android Studio is printing logcat to the bottom pane
  3. Ensure the DS and RC phones are connected as normal
  4. Init and run the "RogueLoop" OpMode
  5. Observe that the OpMode prints a message to logcat every 500ms
  6. While the OpMode is still running, press "Restart Robot".
  7. Wait for the "Restart Robot" to complete
  8. Observe that the rogue OpMode continues to print messages to the logcat every 500ms, despite the DS and RC showing every indication that only the "StopRobot" OpMode is being run

@gearsincorg will this be fixed in v5.0 too?

No. I've added it as an issue, but I'm not skilled in this area to attempt a fix. (Suggestions?)
5.0 is an early-bird release, primarily for the Control Hub Pilot teams.
The hope is to have another 5.x pre-release, and then the main 5.y release in September.

@gearsincorg I did some digging and it appears this issue can be fixed by adding the following as the first thing in EventLoopManager.stopEventLoop():

    if(eventLoop.getOpModeManager() != null)
    {
        eventLoop.getOpModeManager().stopActiveOpMode();
    }

Excellent... I can implement this in the current private repo and see if I can replicate your results.

We appreciate your efforts to find and suggest fixes for these problem.

Phil.

@gearsincorg any update on this?

Also, one other thing, I'd like to revise my suggested fix to:

    if(eventLoop.getOpModeManager() != null)
    {
        eventLoop.getOpModeManager().initActiveOpMode(OpModeManagerImpl.DEFAULT_OP_MODE_NAME);
    }

Why the switch?

Why the switch?

Because that's what's done when the DS sends the command to stop an OpMode. It doesn't call stopActiveOpMode(), instead it calls initActiveOpMode(OpModeManagerImpl.DEFAULT_OP_MODE_NAME)

@FROGbots-4634 I've tested both of your proposed fixes.

The first one successfully detects the stuck loop condition and aborts the running thread the same way it would do if a stop had been requested. The DS is sent the Stuck-Loop error and the RC restarts the app and comes up in a ready to run condition.

The second suggestion did not appear to have any effect. The stuck opmode continued to run through the restart operation, and stayed in memory to permit multiple instances to execute.

After some more testing, I'll be submitting the first approach as a change to the code base.

Thanks for your input.

@gearsincorg Ok cool! I never actually tested the second suggestion..... though don't really understand why it wouldn't work, given that's what the DS does. Then again I don't quite fully understand the state machine in OpModeManagerImpl...

@ftctechnh does this mean it was fixed in the internal repo?

@FROGbots-4634
Yes. Thanks for the fix! It works well.