/prgs-os-linux-linboothkvc

Reboot a linux system into a fresh instance without physically rebooting it, using the power of linux modules. Roughly speaking kexec without kexec ;-)

Primary LanguageAssembly

Readme - linboothkvc
v24Jan2012_1953
HKVC, GPL
A linux kernel module based bootstrap loader
----------------------------------------------

>> 24Jan2012_2113 <<

REALITY BITES BACK - I had a known thing/issue, which I had set aside
for now/then(v1.0 release) with a wish full thinking that as I had
fully flushed the Caches just shortly before disabling the Caches,
there is not much need to re flush the cache (i.e clean and invalidate)
again (Even thou I had the lingering feeling to call it again, but was
lazy as usual and also on BeagleBoard and Pandaboard it worked with out
a flaw, so took a chance). Also ideally the code which re-enables the
Cache like the new linux kernel or what ever else in the Nirvana
environment should take care of doing a thorought cache invalidate
before actually enabling the Cache again. With those thoughts and
lazyness I had left calling the flush again after disable.

However on NookTab once I used more memory entering into linux kernel
and ramdisk in the linboothkvc Nirvana environment, the reality bite
back hard at me, with even the proc_info_list itself not being read
properly and thus __lookup_processor_type failing in kernel. So I
went back and enabled additional flushing after the Cache disable in
Proc_Finish and viola, the kernel is very much more alive know :-)
on NookTab, except for some small love here and there. So now with
LinBootHKVC on NookTab, a lot more people can successfully experiment
and get a live and kicking linux kernel up and running happily with
out much difficulty.

So put differently atleast in android linux 2.6.35 source there is
some issue/corner case with thorough cache invalidation before cache
gets enabled or before it does anything significant. However now it
is taken care of while disabling itself, which is also a good thing.

Found that the -save-temps flags effect on kernel module compilation is
actually visible in kernel source directory. I was originally expecting
to see its effect of intermidiate files in kernel module diretory.
I had given up hope on getting access to intermidiate files, and used
other methods to pin point the compilation issues, but now I have access
to them, if and when required in future.

The Virtual uart addresses which I had hardcoded in OmapNirvana2.S
even thou allocated using ioremap, actuall get mapped the same always
because there is a default mdesc.map_io based hardwired mapping for
io, which is setup during kernel booting. However in OmapNirvana3.S
I have removed the writing to Virtual uart addr, because what I wanted
to debug using it is already achieved. However with this knowledge,
if required I can go back to using these hardcoded virtual addresses
for uart as long as the mach-omap2/io.c mappings don't change.

Also scripts/setlocalversion in linux kernel source may require
patching if you want to use git with the BN source and still use
the same source with changes between kernel build and module
builds. Just a note to myself.

>> 22Jan2012_2300 <<

linux-kernel
--------------
The zImage entry point which decompresses the kernel, relocates it
and sets up cache is working properly.

However after it jumps to stext using call_kernel, some where beyond
this point the kernel is failing (May be some SMC call for L2 cache
related stuff or some such thing, I have to debug this more later).

UPDATE: 24Jan2012_1757: Rather in linboothkvc environment, the kernel
fails in lookup_processor_type in arch/arm/kernel/head.S(head_common.S)

u-boot
---------

At one level u-boot works directly off the shelve in the linboothkvc
environment from x-loader. However to get good use of it

a) All the SMC calls require to be bypassed.
b) It doesn't implement the bootm command, will require to implement
   it if required.
c) some other cleanups and updates wrt env variables for u-boot env
   etc.

>> 22Jan2012_0015 <<

The following had to be disabled/bypassed in x-loader to get it running in
linboothkvc Nirvana environment

a) Disabling majority of use of memory around 0x4030.xxxx
   - None of them affect x-loader functionality
   - To take care of few of the required functionality the load address for
     X-loader was changed to 0x8000.7ff0 in place of 0x4030.4350
   - Area corresponding to STACK offset in this memory didn't create any
     problem.
b) Setting up MPU and IVA dpll clocks
   - Doesn't affect x-loader functionality directly
   Haven't checked at linux kernel level yet, but don't see a reason why
   it should affect.
   Not sure if we running from DDR memory created problem for this as there
   is a possibility that DDR memory clock is derived from these clocks. I
   haven't looked at the Clock hierarchy of Omap4 currently.
c) initialisation of DDR
   - Again doesn't affect, we are already running in DDR. Also because
   we are already running in DDR memory, trying reinitialising could have
   potentially created the problem.
d) Call to smc to enable L2 (PL310) cache
   - Again doesn't affect x-loader directly
e) Few cross check calls to SEC_ENTRY_Std_Ppa_Call
   - Doesn't affect directly

NOTE1: It expects the u-boot.bin to be called u-bootk.bin in MMC/SD.
NOTE2: It expects dummy data of 0x120 prepended to actual u-boot.bin file
       when creating the u-bootk.bin

HOWEVER what I have found is that the x-loader provided by BN(from Ti) is
very severly limited wrt handling MMC devices in NON RAW MODE. so currently
if you use the one bundled as NChild in linboothkvc or from BN source dump
it won't load the u-bootk.bin.

I will look at patching this up in next few days. Now this is a thing which
I hadn't originally expected, not that it is complicated or something like
that, it is pretty straight forward, but it does require some time to be
spent on it, which I hadn't originally planned for. Also may be using the
latest Ti x-loader for Omap4 could turn out to be the short cut for myself
or anyone else interested in experimenting.

UPDATE: Rather even thou the x-loader from BN doesn't have all the latest
u-boot features or even some of the x-loader features. Still it has basic
support for loading files from a fat filesystem with certain assumptions
like it will only work with the 1st partition on a MBR based SD card and
few other minor issues. But that in itself is good enough.

>> 21Jan2012_0125 <<

No additional changes required on Omap4/Nooktab compared to Omap3/Beagle at a
fundamental level.

However on NookTab the SRAM at 0x4030.xxxx (Rather as of now I have tried only
0x4030.4350 - the x-loader default load address) seems to be locked out by HS
or I am doing something wrong, which I have to cross check later.

Also in the interim, I implemented logic for L2X0 cache flush and using the
ARM CP15 Address translation support instruction. Because of the 0x4030.xxxx
address issue, which I didn't realise initially, even thou it was in front
of my eye, because I did the cardinal sin of adding/changing too many things
in one shot before testing on the target.

Also the x-loader for omap4 (haven't cross checked this particular part on Omap3
recently, but don't remember it having this argument passed thro r0 logic.)
expects a meta data structure to be passed thro R0 register which gives it
info wrt boot device and boot mode. So one has to patch x-loader appropriately.

>> 16Jan2012_2154 <<
:-) SUCCESS ON a Actual HARDWARE :-) (not _the_ h/w yet ;-)

For Arm Cortex_A8 based SOCs (tried with Dm3730)
Hopefully similar conclusion should work for Omap4 successfully, except for any
unknown (to me currently) SMP issues, inspite of me shutting down the 2nd processor.
------------------------------------------------------------------------------
After further experimentations, digging thro docs and frustrations, have finally
realised that MVA based Cache maintainance operations aren't sufficient for
flushing updated data from Cache to Memory, because the POU or POC in many cases
terminates at L2 or some other Cache level rather than Memory.

Now MVA based cache flush operations are fine for normal linux kernel operations,
but for getting into a Nirvana (new prestine execution environment) setup with
caches disabled, this is not good enough as the changes don't hit the memory.

Leave linboothkvc related memory operations for Nirvana or NChild code aside,
Rather even the kernel module loaders symbol resolution updates to the
kernel module code also doesn't get saved to memory - because even it uses
flush_icache_range which inturn uses v7_coherent_kern_range which inturn
uses MVA based cache flush operations.

So in conclusion we require a cache flush logic which uses set/way based cache
maintainance operation and inturn walks thro all the cache heirarchy levels.
In turn the flush_kern_all provides the required operation to achieve the same.

Currenlty I am using my understanding of Omap3 bootrom code based on the dump
of the same in qemu-linaro as well as on Dm3730 using openocd and xds100v2
to jump to the new bootloader (i.e NChild, which is x-loader currently) from
Nirvana code. So this will work with BeagleXM. However for NookTab, I have
to look at Omap4 bootrom, for which I haven't looked at jtag access currently.

But once I have setup the proper Nirvana code for NookTab, the logic should
work 99% on it, as my finding/understanding of Cache issue which I faced
seems to indicate the same unless SMP adds some additional complexity, which
seems less likely currently to me.

NOTE: TOCHECK_LATER: If I put a busy loop in the Nirvana Code of sufficiently
long duration then the system seems to go haywire and abort, may be some
watchdog timer is trying to kick in or ?????, have to debug later.

NOTE: In ARM Clean cache means flush the data from Cache to next cache level
or memory and Invalidate cache means make the given cache line entry invalid,
so that a fresh access to the address will force a read from the next cache
level or memory.

>> 16Jan2012_0138 <<

There is a VA-to-PA translation instruction provided by ARM CP15 system
coprocessor.

There is simple sys and msr and mrs (if I remember names correctly) instructions
available instead of mcr and mrc instrcutions

The swi has been renamed svc

For cache and if required TLB flushing MVA based instructions of CP15 will
be the simplest (i.e where the MemoryVirtualAddress is provided).

NOTE: Till date the issues I have found, yet to resolve have been related
to Cache flushing (is it called invalidate or clean or ... in Arm) and
not affected by TLB.

Have to check the relation between TLB and PageTables and Hardwired and
Software controlled in ARM in depth later.

>> 15Jan2012 <<

The issue with disabling mmu with in kernel module not working in last release
has been fixed.

The actual bug was that I had wrongly used mrc after bic instead of mcr. However
it actually wouldn't have worked for 100% in beaglexm, because kernel modules
get loaded into a virtual address around 0xbf00.0000, which is well beyond the
physical memory address available (i.e with actual DRAM) in beaglexm (which is
only 512MB i.e 0x8000.0000+512MB) so setup-mm-for-reset/reboot wouldn't have
set a 1-to-1 (s-to-s) map for this virtual address and once mmu gets disabled
the processor would have messed up with a exception/????

So for now it is only safe to disable mmu from within Nirvana code, and not
from with in kernel module code. And the nirvana code omap3callbootrom1.S
already takes care of this mmu disable properly.

Also found that Bkpt instruction doesn't work between qemu and gdb with target
remote combination. Haven't checked with openocd+gdb and actual h/w yet.

>> 15Jan2012 <<

Have completed the qemu beaglexm based linboothkvc (kexec) FULLY Now. Now from
with in a running linux system in qemu, you can restart a new linux kernel and its
user space.

For this currently it uses the x-loader itself to bootstrap u-boot.bin and inturn
the linux kernel, similar to how the hardware bootrom does it on power on. This
new x-load.bin image is embedded within the kernel module.

I have updated the kernel module such that it takes 2 images into itself.

Image 1 == The Nirvana Code (Example - bloop?.S, omap3callbootrom1.S, ...)
This gets passed the control once the kernel module is done with disabling interrupts,
disabling mmu (something seems to be missing wrt this with the latest changes I did
to accomodate Nirvana and Nirvana's child, have to debug later), setting up the
1 to 1 mmu map, etc.

In turn currently the nirvana code omap3callbootrom1.S takes care of disabling mmu,
interrupts etc once again just to be safe. And inturn it loads the NirvanaChild code
into 0x4020.0800 similar to how Omap3 bootrom loader does its job, and inturn calls
the entry point at 0x4001.4000 in the bootrom. This takes care of setting up the
stack and calling the NirvanaChild Code.

Image 2 == The NirvanaChild Code (Example - x-load.bin (without signGP))
The address and length of this code is passed to Nirvana Code thro r0 and r1
respectively. It is upto Nirvana code to decide what it will do with this image.

As of today only the omap3callbootrom1.S nirvana code handles this image as required
while the older nirvana codes don't know about this.

NOTE: misc/Binaries/HKVC-x-load.bin - Currently the new x-load.bin is expected to be
with in misc/Binaries/HKVC-x-load.bin in the linboothkvc subfolder.

If you want to experiment with a new x-loader or any other bootloader, then copy it
to above mentioned location and then run the following.

./hkvc-make clean
./hkvc-make nchild
./hkvc-make asm
./hkvc-make

OR if you only want to recompile the kernel module for your running kernel on qemu

./hkvc-make

Next copy the lbhkvc_km_DEVICE_BEAGLEXM.ko to qemu running linux (for which you
compiled the kernel module). and insmod the kernel module and see the magic :-)

>> 14Jan2012 <<

Step1 [DONE]: Basic logic working in QEmu BeagleboardXM correctly now. Becaue Qemu doesn't
similate Cache fully the bug related cache handling doesn't affect it. Qemu based snooping
allowd me to identify my stupid mistake of not setting flags in my nirvana code samples.
Also it helped me to run thro the boot flow as well as few other aspects of the shutdown
process fully like identifying the MMU wasn't disabled by the kernel support routine
(Now come on who wants to read thro the code carefully to identify these same, when one can
experiment with the flow in qemu (lazzy me) :-) among others. My only peeve with Qemu
is the way it maps the serial port to stdio or ptys rather indirectly arbitrarily to some
extent.


Step2 [Next]: Next have to fix my known bugs and try on actual BeagleboardXM, now that
I have got hold of xds100v2 yesterday and inturn able to debug Beagle using openocd.
Rather put the required asm code directly instead of calling the hidden kernel functions.

Step3 [After]: With the above two, most probably the code should run straight on Nook also
as I have already taken care of shutting the 2nd proc down. Any additional SMP sync logic
if any in chip to be looked at beyond the ISB and DSB syncing wrt current cpu.

NOTE: jtag access as allowed me to confirm my suspisions wrt cache


>> 08+Jan2012 <<
Moving away from directly trying on NookTab to experiment with qemu initially to help
debug the code easily (instead of requiring to experiment with uart prints).


Bugs - TOFIX
---------------

>> 14Jan2012 <<

The code to execute which is copied to kmalloced location, doesn't seem to be flushed
to memory successfully always. This is also one serious issue affecting my kexec working
currently. Have to look at cache flushing kernel function which I have used properly
tomorrow and if required replace it or, have to move the code to copy to final location
in my code in innermost kexec after it disables interrupts and calls DSB and ISB (just to
be safe, even thou DSB and ISB doesn't seem to be helping here).

Also the patch to the kernel module by insmod support logic in kernel to resolve the
external symbols used in the module itself seems to be NOT fully flushed back into memory
from cache sometimes.



MISC
---------------

>> 12Jan2012 <<

***N*** Omap3 Bootrom
0x40014000 - start of BootRom
0x40014040 - Start Addr saved ?
0x40014044 - Boot message structure


>> 13Jan2012 <<

***N*** Omap3730 linux kernel after proc_fin and setup_mm_for_reboot

**** SCTLR
IF
	MRC p15, 0, <Rt>, c1, c0, 0
THEN
	Rt = 0x10c52c79 (QEmu-BeagleXM)
	Rt = 0x10c52c79 (NookTab)
	Rt = 0x10C52879 (BeaglXM)