/parallelized_kernel_init

Patches to parallelize (speed up) the Linux kernels initialization sequence

This repository is meant to conserve patches I've written in order to
parallelize the Linux kernels initialization.

This patches have been the third evolution of an idea I've had,
implemented and posted early 2014.

Unfortunately the heads of the Linux kernel have been quite fast
to ensure me that they don't want that (without having had a real look
at what I've done). I've no idea why, but I'm sure they haven't tested
it before sending out their responses about the patch series.

Anyway, in case someone has an interest in reducing the boot time of a
Linux kernel, I've put the patches into this repository. They are based
on kernel 4.2(.3) and currently I don't know if I will maintain them for
later kernels (also it should be easy to rebase them).

You can find the introductory mail to the patch series below. Some more
information about the backgrounds can be found in the older two threads
which can be found by searching LKML for the threads

"dt: dependencies (for deterministic driver initialization order based on
the DT)"

and

"deps: deterministic driver initialization order".


Alexander Holler, November 2015

Update May 2016:

I'm using these patches now on Intel and AMD systems as well on ARM
system.
Of course, I still only have added dependencies for drivers I'm
using with my hand-made kernel configs. That means, depending on your
kernel configuration, some necessary dependencies between statically
linked drivers might still be missing.

------------------------- introductory mail -------------------------
Hello,

here is the newest version of my patches to use a dependency based
initialization order. It now works without DT too.

Background:

Currently initcalls are ordered by some levels and the link order. This
means whenever a file is renamed, changes directory or a Makefile is
modified the order with which initcalls are called might change. This
might result in problems. Furthermore, the required dependencies are
often not documented, sometimes there are comments in the source or in a
commit message, but most often the knowledge why a specific initcall
belongs to a specific initcall level isn't obvious without carefully
examing he source. And initcalls are used by drivers and subsystems, and
the count of both have grown quiet a lot in the last years. So it's
rather difficult to maintain a proper link order.
Another problem is that the link order can't be modified dynamically at
boot time to include dependencies dictated by the hardware. To circumvent
this, a brute-force trial-and-error mechanism called deferred probes has
been introduced, but this approach, while beeing KISS, has its own
problems.

To solve these problems I've written patches to use a topological sort at
boot time which uses dependencies to calculate the order with which
initcalls are called.

Why? What are the benefits (assuming correct dependencies are available)?

- It offers a clear in-source documentation for dependencies between
  initcalls.
- It is robust in regard to file or directory name changes and changes in
  a Makefile.
- If enabled, the order with which drivers for interfaces are called
  (e.g. network interfaces, hard disks), can be defined independent of
  the link order. These might result in more stable interface names or
  numbers.
- If enabled, it makes the the deferred probes obsolete, which might
  result in faster boot times.
- If enabled, it is possible to call initcalls in parallel. E.g. the
  shipped kernel for Fedora 21 (4.1.7-100.fc21.x86_64) contains around
  560 initcalls. These are all called in series. Also some of them use
  asynchronous stuff by themself, most don't do.

Drawbacks:

- It requires a small amount of time to calculate the order a boot time.
  But this time is most often smaller than the time saved by using
  multiple cores to call initcalls or by not needing deferred probes.
- Dependencies are required. For everything which can be build as a
  module, looking at modules.dep might give some pointers. Looking at
  the help from menuconfig also might give some pointers. But in the
  end, the most preferable way would be if maintainers or other people
  which have a deeper knowledge about the source and functionality
  would add the dependencies.

And last but not least, these feature is totally optional. If disabled,
nothing changes.


Some words about the patches:

Patch 1 introduces annotated initcalls, patch 2 uses them and patch 3
adds hardware specific dependencies from the device tree to the existent
set of dependencies.

Patch 4 and 5 are examples about how to achieve a stable initialization
order for various type of drivers. The two patches are doing this for
network interfaces (based on the existing link order) and I2C busses
(based on the new driver IDs). I leave it for discussion which one should
be used. My suggestion would be to order the IDs like the link order and
then use the order based on IDs. That would get rid of the link order while
also beeing downwards compatible.

Patches 6-9 do contain changes for dtc to enhance the binary blob (dtb)
with type information for phandles (the dependencies used by patch 3).
I've already posted these patches a year ago.

Patch 10 contains the list of driver IDs I've annotated and is NOT meant
for merging. Patch 11 contains all the changes on drivers I've annotated
and is NOT meant for merge too. I've no idea how such patches might end
up in the kernel (in fact I've no idea what's necessary for any patch),
therefor these two patches are there to offer a quick start to evaluate
this feature.

The final 3 patches contain a few changes for the DTs of 3 ARM boards.

Because I don't want to fiddle with problems in the unstable mainline tree,
these patches are based on the latest stable kernel (4.2.3). But besides
the big patch (11) with all the driver changes, they apply without problems
to the current mainline too. And the necessary changes to use the big patch
on mainline should be trivial, as all the changes in this patch are trivial
too.

I've tested this patch series with several ARM(32) systems, x86 and x86_64
and the boot time was faster on allmost all systems. Either through
avoiding deferred probes (on single core machines) or by using the
parallel initialization. And keep in mind that, also I've already
annotated quiet a lot of initcalls, a big bunch of them still are left
and therefor are still called in series using just one core and not in
parallel.

Keep in mind that these patches are an offer, I wasn't paid for them and
I don't need them in the mainline kernel.
As long as you are able to use a polite language, you can send me any
comment, regardless if it's good or bad. But, please, keep away with
Baby-Speak or contumelious comments.

And, just in case, I'm aware that adding the necessary dependencies means
some effort and a lot of (trivial) commits and therefor having all
possible initcalls annotated would be a long term goal. But, besides that
this could be done smoothly without any need to hurry, I think it makes
sense. Otherwise, in my humble opinion, the problems to keep an overview
and ordering initcalls will just become worse.


Alexander Holler
------------------------- introductory mail -------------------------