Fresh 9front install in QEMU hangs at boot
GoogleCodeExporter opened this issue · 25 comments
What happened:
I installed a 9front onto a qcow2 image through QEMU. I finished the
installation, and QEMU rebooted. I killed the boot, and restarted QEMU to boot
into the 9front installation. However it hung just after reaching the
bootsector.
What was expected:
After printing that the PC is booting from the Hard Drive / MBR, 9front should
boot.
Steps to reproduce:
0. Install QEMU (I use Homebrew's bottled QEMU 2.1.2)
1. Download 9front ISO (9front-3853.02ebd469f43a.iso.bz2)
2. Create new QEMU image: `qemu-img create -f qcow2 9front.qcow2.img 20G`.
3. Boot 9front ISO: `qemu-system-i386 -hda 9front.qcow2.img -cdrom
9front-3853.02ebd469f43a.iso -boot d -vga std -m 1G`
4. Install 9front onto "9front.qcow2.img"
5. Press enter at [finish], QEMU reboots (into the ISO, since the QEMU call
hasn't changed)
6. Kill QEMU
7. Boot 9front installation: `qemu-system-i386 -hda 9front.qcow2.img -boot c
-vga std -m 1G`
8. Boot hangs
Original issue reported on code.google.com by alexchan...@gmail.com
on 28 Sep 2014 at 3:01
Attachments:
- [Screen Shot 2014-09-27 at 9.48.48 PM.png](https://storage.googleapis.com/google-code-attachments/plan9front/issue-213/comment-0/Screen Shot 2014-09-27 at 9.48.48 PM.png)
that seems odd. there are way too many dots here. the pbs
is responsible for loading the 2nd stage loader "9bootfat"
from the root of the 9fat partition.
boot the iso, and in a rio window, type:
9fs 9fat
then compare the files:
/n/9fat/9bootfat with /386/9bootfat
like:
ls -l /n/9fat/9bootfat /386/9bootfat
md5sum /n/9fat/9bootfat /386/9bootfat
they should be identical.
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 3:30
So /n/9fat/9bootfat didn't exist. I reran the installation, and I tried hjfs as
well, but it didn't help.
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 3:52
that would make sense, the pbs is scanning the root
directory looking for the file.
the 9fat partition is setup in the "bootsetup" step
of the install process. re-run this step and check
for error messages (scroll up if it scrolled away).
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 4:06
Okay, just reinstalled again. If I run `md5sum /n/9fat/9bootfat /386/9bootfat`
at the end of the installation, just before finishing and rebooting, then
/n/9fat/9bootfat is present, and the md5 sums match.
However, after rebooting, the boot still hangs. I'm stuck on the divide error
bug, but I bet /n/9fat/9bootfat wouldn't exist, if I could get to rio.
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 4:10
maybe we'r just rebooting too fast before qemu flushes its data
to the disk? at least you got the kernel booted now. the divide
by zero panic is caused by the stats(1) command reading /dev/sysstat
(this little graphing system statistics window).
you might just wait a bit before hitting enter on the bootargs
prompt to avoid this.
you can also try the kernel i just made that has the fix:
http://www.felloff.net/usr/cinap_lenrek/9pcf.alexchandel
you can copy it to 9fat renamed as 9pcf.
another thing, the 9front kernel is a multiboot image.
you can try loading it directly with qemu with the
-kernel option. plan9.ini (contents) can be passed as
-initrd option.
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 4:28
alexchandel: remember you need to run 9fs 9fat to mount the 9fat partition.
/n/9fat will not be mounted until you do so. also note: /n/9fat will only be
accessible from the same namespace where you run 9fs 9fat.
Original comment by stanley....@gmail.com
on 28 Sep 2014 at 4:29
Nice, I booted with 9pcf.alexchandel with the command: `qemu-system-i386 -hda
9front.qcow2.img -cdrom 9front-3853.02ebd469f43a.iso -boot d -vga std -m 1G
-kernel 9pcf.alexchandel -initrd plan9.ini`
As soon as the GUI is drawn, a screen with this error flashes:
Plan 9 Console
i8042: 08 returned to the ea command
It disappears quickly, and then there's a "kernel fault: no user process"
panic. I've attached a screenshot.
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 4:53
Attachments:
- [Screen Shot 2014-09-28 at 12.52.37 AM.png](https://storage.googleapis.com/google-code-attachments/plan9front/issue-213/comment-7/Screen Shot 2014-09-28 at 12.52.37 AM.png)
what is the content of the plan9.ini you passed to qemu?
Original comment by mischief@offblast.org
on 28 Sep 2014 at 5:03
@mischief It's:
config for initial cd booting
cdboot=yes
mouseport=ask
monitor=ask
vgasize=ask
bootfile=/386/9pcf
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 5:18
And yeah, I ran `9fs 9fat` before checking each time, and from within the same
window. In fact `/n/9fat` was empty. Also it's worth noting that for my past
three posts, I chose cwfs64x during installation.
When I use hjfs and boot with `qemu-system-i386 -hda 9front.qcow2.img -cdrom
9front-3853.02ebd469f43a.iso -boot d -vga std -m 1G -kernel 9pcf.alexchandel
-initrd plan9.ini`, the "panic: kernel fault: no user process" error doesn't
occur. However, `/n/9fat` is still empty.
Moreover, booting with `qemu-system-i386 -hda 9front.qcow2.img -boot c -vga std
-m 1G -kernel 9pcf.alexchandel` gives lots of errors, mostly along the lines of
"can't open, /rc not found".
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 5:40
To summarize, 9front appears to create the second stage bootloader on the hard
drive during installation, but after rebooting it's gone. Booting off the hard
drive hangs; it's only possible to boot off the ISO. Even booting off the hard
drive using a kernel image (thus skipping the bootloader) still fails.
Additionally, after installation, if the HD's filesystem is cwfs64x, booting
off the ISO will panic with "kernel fault: no user process".
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 7:30
decoded the panic, but it makes no sense. it would mean
that the machp[0] array contains 0x9 for the mach address
of cpu0. this entry gets only set once to a fixed
address and then is never touched.
term% ktrace -i f0108507 f0015b24
src(0xf0108507); // dumpstack+0x10
// data at 0xf0015b2c? f0163141
src(0xf0163141); // panic+0xd2
// data at 0xf0015c54? f010867a
src(0xf010867a); // fault386+0xd2
// data at 0xf0015d04? f0107c14
src(0xf0107c14); // trap+0x15b
// data at 0xf0015dc4? f01005ec
src(0xf01005ec); // forkret
//passing interrupt frame; last pc found at sp=0xf0015dc4
// data at 0xf0015e04? f013882e
src(0xf013882e); // ps2mouseputc+0x19
// data at 0xf0015e38? f01f2add
src(0xf01f2add); // i8042intr+0x7a
// data at 0xf0015e58? f0107c14
src(0xf0107c14); // trap+0x15b
// data at 0xf0015f18? f01005ec
src(0xf01005ec); // forkret
//passing interrupt frame; last pc found at sp=0xf0015f18
// data at 0xf0015f58? f010055b
src(0xf010055b); // halt+0xe
// data at 0xf0015f64? f015d946
src(0xf015d946); // idlehands+0x11
// data at 0xf0015f70? f020bee6
src(0xf020bee6); // runproc+0x160
// data at 0xf0015fa4? f020b6b5
src(0xf020b6b5); // sched+0x165
// data at 0xf0015fd0? f020b463
src(0xf020b463); // schedinit+0x85
// data at 0xf0015fe4?
acid: src(0xf013882e); // ps2mouseputc+0x19
/sys/src/9/pc/mouse.c:99
94 int buttons, dx, dy;
95
96 /*
97 * Resynchronize in stream with timing; see comment above.
98 */
>99 m = MACHP(0)->ticks;
100 if(TK2SEC(m - lasttick) > 2)
101 nb = 0;
102 lasttick = m;
103
104 /*
acid: asm(ps2mouseputc)
ps2mouseputc 0xf0138815 SUBL $0x28,SP
ps2mouseputc+0x3 0xf0138818 MOVL packetsize(SB),DI
ps2mouseputc+0x9 0xf013881e MOVL nb$1(SB),SI
ps2mouseputc+0xf 0xf0138824 MOVL c+0x0(FP),BX
ps2mouseputc+0x13 0xf0138828 MOVL machp(SB),AX
ps2mouseputc+0x19 0xf013882e MOVL 0x24(AX),BP <- fault
ps2mouseputc+0x1c 0xf0138831 MOVL BP,CX
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 3:24
ok, i could reproduce this now with many tries in qemu for windows.
the trick is to keep twitching the mouse on boot constantly. fix
commited in rd2af87472b59. see the explaination there. i build
another kernel for you to test under:
http://www.felloff.net/usr/cinap_lenrek/9pcf.alexchandel
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 4:35
- Changed state: NeedsTesting
The panic no longer occurs. However I just noticed an abnormalities during the
install:
Ream the filesystem? (yes, no)[yes]
Starting cwfs64x file server for /dev/sdC0/fscache
Reaming filesystem
bad nvram key
bad authentication id
bad authentication domain
nvrcheck: can't read nvram
config: config: config: auth disabled
config: config: config: config: config: config: config: currnt fs in "main"
cmd_users: cannot access /adm/users
63-bit cwfs as of Thu Sep 4 20:04:10 2014
last boot Sun Sep 28 17:06:33 2014
Configuring cwfs64x file server for /dev/sdC0/fscache
% mount -c /srv/cwfs /n/newfs
Mounting cwfs64x file server for /dev/sdC0other
% mount -c /srv/cwfs /n/other other
The bootsetup still appears error free:
dossrv: serving #s/dos
% dd -bs 512 -count 1 -if /dev/sdC0/9fat -of /tmp/pbs.bak
1+0 records in
1+0 records out
Initializing Plan 9 FAT partition
% disk/format -r 2 -d -b /n/newfs/386/pbs /dev/sdC0/9fat
Initializing FAT file system
type hard, 12 tracks, 255 heads, 63 secors/track, 512 bytes/sec
used 4096 bytes
% mount -c /srv/dos /n/9fat /dev/sdC0/9fat
% rm -f /n/9fat/9bootfat /n/9fat/plan9.ini /n/9fat/9pcf
% cp /n/newfs/386/9bootfat /n/9fat/9bootfat
% chmod +al /n/9fat/9bootfat
% cp /tmp/plan9.ini /n/9fat/plan9.ini
% cp /n/newfs/386/9pcf /n/9fat/9pcf
% cp /tmp/pbs.bak /n/9fat
% unmount /n/9fat
Regardless, /n/9fat is still empty when I reboot and run "9fs 9fat". And
attempting to boot into the HD still hangs at "MBR...pbs....."
Moreover, attempting to boot into the HD using the kernel flag
(`qemu-system-i386 -hda 9front.qcow2.img -boot c -vga std -m 1G -kernel
9pcf.alexchandel`) throws bad nvram key errors and more, screenshot attached.
When I type `ls` in the terminal, it errors with:
checktag pc=9b4f cw"/dev/sdC0/fscache"w"/dev/sdC0/fsworm"(11305)
tag/path=Tnone/0; expected Tdir
ls: . :phase error -- cannot happen
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 5:48
Attachments:
- [Screen Shot 2014-09-28 at 1.45.31 PM.png](https://storage.googleapis.com/google-code-attachments/plan9front/issue-213/comment-14/Screen Shot 2014-09-28 at 1.45.31 PM.png)
the messages from the installation are expected. these are ok.
but after reboot, the fat is missing and the cwfs filesystem
is partially corrupted. my guess would be that we'r just too
fast in rebooting? and qemu doesnt flush the changes out to
the qcow image for some reason?
reads and writes to /dev/sdXX/parts are uncached and synchronous.
plan9 kernel has no buffer caches. and dossrv writes immidiately.
maybe qemu expects us to issue write barriers to really
flush stuff to the disk?
maybe just wait a minute after installation when it prompts
for the [finish] step?
i can try checking qemu source in the meantime...
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 6:04
short explaination what checktag messages are:
the cwfs fileserver uses blocks (of 16k in case of
cwfs64x) where it stores some redundant checking
info at the end (the tag). the tag contains the
type of the block (file-data/directory/indirect
pointer blocks...) and the qid (file number). it
always checks the tag to see that the block is just
read is what it expected.
a tag of Tnone/0 means the tag is zero. the block
appears to be zeroed out. ... like it was never
written.
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 6:10
I waited ~20 minutes, same result. Is it possible that 9front is corrupting the
filesystem when it's shutdown? Zeroed out blocks might be a result of qcow2
corruption.
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 7:38
cwfs writes changes to disk lazily. that is, theres a background
process that flushes dirty blocks to disk. but waiting 20 minutes
is a bit crazy. it should be a few seconds at max. even with
qemus slow i/o, not more than 10 seconds max.
dossrv on the other hand writes immidiately. the write() syscall
will not return until dossrv did the whole roundtrip to disk.
what puzzles me is that your fat filesystem is missing.
this corruption cannot be explained with the lazy writing
of cwfs.
maybe it has someting todo with the qemu configuration? can
you try using a sparsefile for the disk image? maybe the
qcow got damaged with all this testing?
people use qemu with 9front for a while now, but these issues
didnt came up yet.
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 7:58
another theory. maybe the ide controller that qemu emulates
doesnt work right?
you could try using virtio instead.
Original comment by cinap_le...@felloff.net
on 28 Sep 2014 at 8:01
Just noticed, when I restart QEMU by entering `fshalt` in 9front, and then
`system_reset` in the QEMU console, the filesystem is preserved, and /n/9fat
has its contents. However, if I restart QEMU in *any* other way, including
killing it while 9front is idle, then the filesystem is corrupted.
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 8:18
http://wiki.qemu.org/Features/Qcow2DataIntegrity recommends using I/O barriers
to avoid data corruption.
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 8:19
Nevermind, I was using the wrong image. `fshalt`/`system_reset` still results
in a corrupted filesystem.
Original comment by alexchan...@gmail.com
on 28 Sep 2014 at 10:13
any progress here? is this still reproducible?
Original comment by mischief@offblast.org
on 28 Dec 2014 at 8:13
The newest ISO, 9front-4045 still exhibits the same hanging behavior, when the
reported steps are performed. (install, [finish], QEMU restarts, kill QEMU,
restart QEMU without cdrom arg, hangs at boot)
Original comment by alexchan...@gmail.com
on 1 Jan 2015 at 7:33
are you still using qemu 2.1.2, and the same qemu arguments as in the original
bug report? i can try to reproduce on this version, but i only have linux to
test on. i have never had a problem like you described, and i've tried quite a
number of qemu versions during ethervirtio development.. it could be an
osx-specific issue, or an issue with how brew packages qemu..
Original comment by mischief@offblast.org
on 2 Jan 2015 at 12:57