ftctechnh/ftc_app

BUG: OnBot Java overwrites files

rdrpenguin04 opened this issue · 19 comments

Hello! I am Ray, and our team experienced a serious problem with OnBot Java recently.

One of the unique things that our team does with our Java code is that we split it between many different files. A consequence of this is that we wind up switching back and forth between files a lot. This shouldn't be an issue.

However, we think that we have found a bug. When switching between files, occasionally the file being opened will overwrite the file being closed (or maybe the other way around, we aren't entirely sure). As such, the old file is completely lost. For example, our DriveMain TeleOp got overwritten by a bare-bones Autonomous that we had just started writing.

Our phone is an older Motorolla if I remember correctly.

Steps to reproduce:

  1. Create multiple files in the OnBot Java interface (we have 8-12). Make sure that they are distinct from one another so that the result can be observed.
  2. Type in one file, then quickly switch to another file in the list. Repeat until corruption occurs.

We don't know anything besides this. If there's anything that we can do to help fix the problem, let us know.

@rdrpenguin04

If you can reliably reproduce the error, what happens if you click the project view pane before switching files, but after typing, for each iteration of the test?

OnBotJava automatically saves the file routinely while editing, but any UI interaction outside of editing code also triggers OnBotJava to save the file (including changing files, but in your case that seems like it's being problematic).

Additionally, you can try changing files while the current file is being swapped out (just try to rapidly switch files, don't edit them). Is there any noticeable increase or decrease in the rate of corruption?

I'm not sure how consistent this is, and I won't be able to test again until Monday. However, I'll go ahead and check that. By the project view pane, I assume you mean the sidebar?

I'm pretty sure that rapidly swapping works as well to reproduce, but I'm not sure. We never catch it before it happens, and usually, when it happens, we're too busy trying to recover to try different things. So, honestly, all I know is that switching files causes this problem, and maybe something else in addition to that makes it worse.

I'll check when I get back to the robot phone on Monday, and I'll note my findings back here then. Thank you for the quick response by the way!

Scratch that, I'll have to wait until Tuesday. I have a couple of complications preventing me from doing anything on Monday.
In the meantime, I think I'll see if I can get an emulator running that can test this.

The problem with running the RC app in an emulator is that it has native code, for which we don't provide an x86 version. Which means it will only run on an ARM-based virtual device, which is dramatically slower.

We have no idea how to reproduce.

When I was trying what you suggested, as well as trying my original reproduction instructions, we were unable to get the glitch to occur. All of our files that we have tested remained uncorrupted. I'll see if I can try anything else, but, since we're coming up on competition soon, it isn't top priority.

As a side note, I'll potentially be making another bug report for the RC in general that screwed us today, which is what caused me to not be able to test further on this.

Well, we just accidentally made the bug happen again a couple of days ago, and I cannot tell you what we did differently. We had a backup of the file though, so we were able to recover quickly.

Does anybody know what could be causing this? If not, could someone tell me where in this repo is the code for running OnBot so that I may take a look for myself?

@rdrpenguin04 the source code for 99% of the SDK is hidden away inside AAR files. Here's a link to the OnBot source files in a repository that has all the code opened up: https://github.com/OpenFTC/Extracted-RC/tree/master/OnBotJava/src/main/java/org/firstinspires/ftc/onbotjava

Wow! Impressive work extracting all of that!

I'm currently looking around trying to find some Javascript code to start with. As people who have tried multithreading in JS know, there is no possibility for race conditions in normal JS. However, if two XMLHttpRequests or something like that happen, the order in which they are processed server-side can be different, which can cause a server-side race condition, which can cause the symptoms mentioned earlier.

@NoahAndrews didn't you say that there was a bug with FileBasedLock?

Yes, FileBasedLock is currently broken, and that may contribute to this behavior, but it is not the entire cause. There is an internal PR that attempts to fix this.

Ya, this happened to us repeatedly today -- lost a couple hours of work -- we just had to walk away in frustration at the end of the day because it happened again (before we expected it to happen, so hadn't backed up yet)..

We absolutely love the ease of OnBotJava (after having used Android studio for years), so not sure how to help reproduce it. Aside from copy/pasting into other files on the computer (that are backed up) is there a built in way to backup our files? Would certainly mitigate this issue if that was available. Thank you for all you work on this!

There is no built in way, but there is ftc_http that came to our rescue recently. As long as you can connect a Windows, Mac, or Linux computer to the phone via WiFi (same as programming OnBot normally) you can run it to download your files to another directory. I recommend using that.

However, of course, if this keeps happening randomly and without any apparent way to reproduce, no backup strategy will really be enough. @NoahAndrews What's the status of that PR, and when is it expected to be merged?

If anyone can "reliably" reproduce, that would be ideal.

Of course, if anyone can follow my instructions with my first comment on this thread, and report back, that's still useful.

Hi! I was wondering if anyone revisited this issue and found a solution? I'm experiencing this right now every time I try to code and we have a competition in a couple of days. It's causing an extra problem. I'm not able to actually build any of the code to the robot because when the files are overridden, it causes errors with duplicate classes. So not only am I constantly losing code, it's also stopping me from building new code because of the error happening so frequently.

It just happened to us again a few days ago with our autonomous. The only solution at the moment is to keep regular backups, and, to try to avoid the issue, don't swap classes rapidly. That seems to be something that aggravates the issue. I'd look into things if I had more time, but I can only assume it's some kind of race condition somewhere. I hope this gets resolved soon, especially since phones are being phased out next year in favor of a device that could potentially have even worse issues due to being slimmed down.

On Thu, Jan 16, 2020 at 9:17 PM soeyzandiego @.***> wrote: Hi! I was wondering if anyone revisited this issue and found a solution? I'm experiencing this right now every time I try to code and we have a competition in a couple of days. It's causing an extra problem. I'm not able to actually build any of the code to the robot because when the files are overridden, it causes errors with duplicate classes. So not only am I constantly losing code, it's also stopping me from building new code because of the error happening so frequently. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#738?email_source=notifications&email_token=ADMX2RB6NR4KVHY2DODASNTQ6EPMJA5CNFSM4JHBBDL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJGJZDY#issuecomment-575446159>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMX2RBNYKCNOGAYDLXRVNLQ6EPMJANCNFSM4JHBBDLQ .

Thanks for replying so quickly! That's super unfortunate that's there no fix, but I'll definitely back up everything and keep the swapping classes in mind.

v5.4, which is sitting in a branch on the SkyStone repo has some changes designed to help address this problem.

I would suggest migrating https://github.com/FIRST-Tech-Challenge/SkyStone/tree/v5.4

v5.5 has changes which should further help to prevent this