Android-x86 on Marss - Pipeline deadlocked
Opened this issue · 16 comments
I ran into an issue of pipeline deadlock when I was running my Android-x86 image on Marss. The way to reproduce the error is as follows.
Here is my marss and qemu version info:
$ git clone git://github.com/avadhpatel/marss.git
$ cd marss
$ git show --summary
commit 49fda4a45e5b29c7e05b9e456228a4d016831484
Merge: 6cf2d32 4ce18f7
Author: Brendan Fitzgerald <fitzfitsahero@gmail.com>
Date: Tue Aug 20 10:30:37 2013 -0700
Merge pull request #34 from dramninjasUMD/master
Build # of cores string with preprocessor
lines 1-9/9 (END)
$ scons c=1 debug=2
$ ./qemu/qemu-system-x86_64 -version
QEMU emulator version 0.14.1, Copyright (c) 2003-2008 Fabrice Bellard
And we can start the simulation:
(1) $ ./qemu/qemu-system-x86_64 -m 4096 -hda ../path-to-disk/android-64.img -usbdevice mouse -usbdevice keyboard
I am using a customized Android-x86 image (You can download it here: https://www.dropbox.com/s/m83kei9zga82c35/android-64.img). Enter the debug mode which is non-graphical (during the booting you need to type "exit" to continue booting), here is the kernel info of this image:
# uname -a
Linux (none) 3.0.36-android-x86-eeepc+ #1 SMP PREEMPT Tue Aug 27 21:27:01 EDT 2013 x86_64 GNU/Linux
(2) I want to simulate this command:
# am start -a android.intent.action.Main -n com.android.calculator2/.Calculator
If there is GUI, then after this command, the Calculator app would be launched. Without graphics, nothing will happen. So now we add start_sim before this command and try to simulate it.
Switch to the qemu terminal (Ctrl+Alt+2), and type
(qemu) simconfig -machine single_core
Then switch back to the Android terminal (Ctrl+Alt+1), type
# cd /data/marss/
# ./start_sim ; am start -a android.intent.action.Main -com.android.calculator2/.Calculator ; ./kill_sim
(I compiled start_sim
/kill_sim
statically from the source code provided on Marss website. )
(3) The simulation starts:
Switching to simulation
And in my original terminal I can see the Completed Cycles scrolling down...
After a while, the simulation gets stuck and my original terminal's output stops on this line:
...
Completed 24021000 cycles, 1774788 commits: 54459 Hz, 51483
Completed 24034000 cycles, 1786064 commits: 59305 Hz, 51441
Completed 24045000 cycles, 1797644 commits: 51526 Hz, 54243insns/sec: rip ffffffff81026c57
And then after a while qemu exits:
...
Completed 24021000 cycles, 1774788 commits: 54459 Hz, 51483
Completed 24034000 cycles, 1786064 commits: 59305 Hz, 51441
Completed 24045000 cycles, 1797644 commits: 51526 Hz, 54243
qemu-system-x86_64: ptlsim/build/core/ooo-core/ooo.cpp:929: bool ooo::OooCore::runcycle(void*): Assertion
0' failed.
Aborted`
If we look at the code ooo.cpp:929
, we can see that the issue is still caused by "the pipeline could be deadlocked" but this information was not printed out to the terminal.
Just out of curiosity, have you tried running other, simpler binaries in this disk image? Maybe something like ls
?
Yes, running simple things like ls is okay.
Thanks!
SF
-----Original Message-----
From: dramninjasUMD notifications@github.com
Date: Sat, 07 Sep 2013 13:01:25
To: avadhpatel/marssmarss@noreply.github.com
Reply-To: avadhpatel/marss reply@reply.github.com
Cc: schfansfan.nju@gmail.com
Subject: Re: [marss] Android-x86 on Marss - Pipeline deadlocked (#35)
Just out of curiosity, have you tried running other, simpler binaries in this disk image? Maybe something like ls
?
Reply to this email directly or view it on GitHub:
#35 (comment)
Image doesn't work on the any of my repositories (tried anywhere from qemu 0.14 to bleeding edge).
After SeaBIOS initializes, the following message appears:
Booting from Hard Disk...
Error 16
I got it to boot on the master branch. I'll spend some time looking at it.
Thanks for your help!
By the way I have tried checking /proc/kallsyms but there wasn't any kernel
symbol that has an address corresponding to the virtual address that is
shown repetitively in the log file.
On Mon, Sep 9, 2013 at 11:00 AM, Brendan Fitzgerald <
notifications@github.com> wrote:
I got it to boot on the master branch. I'll spend some time looking at it.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/35#issuecomment-24083369
.
@tj90241 I noticed that the image might be corrupted during the downloading, which will lead to the "Booting from Hard Disk..." Error. If that happens, please download it again! Thanks!
Redownloaded; it was a corrupted image, thanks. I'll look into it this weekend.
@tj90241 Thanks Tyler!
I also noticed that qemu 1.2 supports network while qemu 0.14 doesn't, in the case of Android-x86. But I guess it doesn't matter for now.
PS: If any of you are interested in building your own Android-x86 image, here is how to do that: http://www.cs.duke.edu/~schfan/blog/blog/2013/09/13/making-an-android-x86-image-for-marss/ . Thanks!
Found the issue after looking quickly -- MARSS doesn't handle SMC properly. I'm surprised this bug hasn't arisen before now, but it makes sense that it's causing Java to tie up immediately as Java makes excessive use of SMC. Fortunately, it's not related to your image or anything -- thanks for the bug report.
Hi Tyler,
It is great news! Thanks so much for your help!
Could you tell me how you found out this issue? I am learning methods of
debugging in Marss. Also, since the issue is found, are there ways to fix
it? I'd like to help!
Thanks again!
On Fri, Sep 13, 2013 at 1:31 PM, Tyler Stachecki
notifications@github.comwrote:
Found the issue after looking quickly -- MARSS doesn't handle SMC
properly. I'm surprised this bug hasn't arisen before now, but it makes
sense that it's causing Java to tie up immediately as Java makes excessive
use of SMC. Fortunately, it's not related to your image or anything --
thanks for the bug report.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/35#issuecomment-24410914
.
I honestly guess most of it was just intuition. MARSS simulates almost everything perfectly -- as I said before, I have never seen single_core deadlock in ages! Given that knowledge, and that it is widely know that the JVM uses SMC, I then looked at the simulator and lo and behold, it was fairly evident that SMC is not being handled correctly (there are even some unimplemented functions lying around...).
Thanks for finding out the issue! Please excuse my little knowledge in this area, but do you mean Self Modifying Code when you say SMC? If possible, could you please say more about the unimplemented functions you found?
I read the PTLSim manual (version 2007) and it mentioned how SMC is supported (page 31). But what exactly is causing the problem we have? Is it because Marss' "design eliminates forced invalidations when the kernel frees up a page containing code that's immediately overwritten with normal user data"?
I am just wondering what would be the best way to solve/work-around this issue, because running Android-x86 applications is crucial for my current research project. Although I can try looking for the specific functions in JVM and modify them to prevent Marss from crashing, it would be more convincing not to modify the guest OS. Do you think it's possible to fix the SMC related problem in Marss? If so, how long do you think it will take? If you can point out the necessary steps, I'd like to try working on it.
Many thanks!
Yes, I do mean self-modifying code when I abbreviate with SMC.
I did spend some time looking at it this weekend, but unfortunately the bug hasn't been as simple to repair as I had hoped. I have gotten to simulation to proceed further, but the guest either segfaults while running code that is self-modifying in simulation mode, or the pipeline just deadlocks (albeit at a later point in time than it did before the fixes).
The unimplemented function related to SMC is here:
https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L984
It's also very confusing in some cases as to which SMC function is being called in many cases! See:
https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L939
https://github.com/avadhpatel/marss/blob/master/ptlsim/x86/ptlhwdef.h#L1779
(one function accepts a physical address, and another accepts a virtual address).
I'm also not certain that all of these functions ever get called, either...
I have also noticed that the mfnlo and mfnhi variables of the RIPVirtPhys class from PTLsim are always set to zero and not the same way they are in PTLsim? These variables are often used by the simulator in parts of code that check and handle SMC, so I tried to fix that part of the problem. I can send you a patch of what I currently have offline if you e-mail me directly.
AFAIK, SMC did work in PTLsim; sometime after it was merged with MARSS it broke is my guess (?) (it could be that the bug was also in the original PTLsim and wasn't fixed when it got merged with MARSS).
Unfortunately, I'm not sure that there is a way around the bug; that is to say, I'm not certain whether or not you can simply modify the JVM to skirt around this issue. It's certainly possible to fix it, it's just going to be a difficult bug to properly track and solve in my mind. My next goal was to see if I could write a very small piece of SMC and try to reproduce the issue so that the log is more manageable size to read and the problem is easier to debug, but I ran out of time this weekend.
Thanks so much for your help!
After seeing your comments, I first thought it was due to Android's JIT (just-in-time) execution mode. I turned it off system-wide, but it didn't work. Then I tested if it's related to Java virtual machine (Dalvik) issue and it seems to be the case.
(I need to point out that I tested Java in the Ubuntu disk image on Marss and it was okay.)
I added a Dalvik Executable file in the Android disk image and simply executing it will reproduce the error. I have updated the disk image file, please download it again: https://www.dropbox.com/s/m83kei9zga82c35/android-64.img .
Now if you boot the Android virtual machine, (switch to the qemu terminal and simconfig -machine single_core
and then switch back to the Android terminal) type:
# su
# cd /data/marss/
# ./run_java.sh
the simulation will soon terminate because of the same pipeline deadlock issue.
@tj90241 I will email you directly regarding the patch file you have. Thanks!
PS: If you want to write your own java file and execute it in Android, here is how to do it: http://www.cs.duke.edu/~schfan/blog/blog/2013/09/19/executing-dex-file-in-android/.
Hi,
I am just wondering if anyone would still like to work on this issue. Thanks!