cfenollosa/os-tutorial

[Open letter] long-standing problems with this tutorial and its consequences due to good SEO

lukflug opened this issue ยท 8 comments

I am writing this because this tutorial seems to be very popular, appearing as one of the top results on many search queries. This means many beginners in operating system development (OSDev) have used this tutorial. Unfortunately, certain aspects of this tutorial have led to many beginners (many of whom I have interacted with on various OSDev online help channels) getting a number of misconceptions. Most of these are not unique to this tutorial (in particular, many seem to originate from the University of Birmingham PDF file, which this and many other tutorials seem to be largely inspired by), but this tutorial seems to be one of the most popular ones, and has spawned many more tutorials with similar issues.

I'd like to begin by pointing out that I must disagree with the characterization of semaphores, paging (which I presume is what is meant by "pagination"), and memory management as "advanced features". They are concepts fundamental to any remotely modern OS. I'd also like to say that reading existing implementations is a good idea to learn about implementing an OS. We all stand on the shoulders of giants, and learning from previous examples (going beyond tutorials) is indispensable. In particular, there exist many educational kernels, like xv6. One has to have an idea of how an OS is structured overall, and have a grand design, before starting to code, otherwise one will only get so far before ending up asking "what next?"

This document will describe many problems with this tutorial, starting with small and specific issues, and then gradually cover more general issues, ending with the inherent problems regarding the general framing itself.

Issues with the explanations

Certain parts of the explanations have helped propagate widespread misconceptions. In the following I will address some of them.

Lesson 2

We will set tty mode only once though in the real world we cannot be sure that the contents of ah are constant. Some other process may run on the CPU while we are sleeping, not clean up properly and leave garbage data on ah.

It is unclear what is meant by "real world", i.e. the conditions under which some other process may run on the CPU while we are sleeping. It isn't even clear what is meant by "sleeping". There's nothing in that particular sequence of instructions that would cause the CPU to drop to a lower power state. Assuming we are talking about our boot sector being preempted by a hardware interrupt, it would be impossible to reliably invoke int 10h if ah were not left alone or restored by the routine servicing the interrupt, since the hardware interrupt could happen between setting ah and int 10h. Since this is the boot sector, the only active hardware interrupt service routines are those set by the BIOS, which should conserve all registers (unless the BIOS has some kind of showstopper bug)! At any rate, hardware interrupts (other than NMI) can be disabled using cli.

Lesson 4

Remember that the bp register stores the base address (i.e. bottom) of the stack, and sp stores the top, and that the stack grows downwards from bp (i.e. sp gets decremented)

It seems strongly implied (in a way consistent with the tutorial from the University of Birmingham) that bp plays a similarly important and symmetric role to sp, i.e. that bp and sp store the boundaries of the stack. This, however, is not the case. bp is functionally a general purpose register, and does not play a special role in any stack-related instruction (e.g. push, pop, call, ret), aside from enter and leave (and in most ABIs bp is the frame pointer, the base of the current stack frame, hence why enter and leave are what they are). bp can have any value the programmer wishes, and be used as a general purpose register as it often is by modern compiler-generated code for x86-64. The only special thing about bp is that the default segment when using it as the base of an effective address is ss and not ds. This misconception is further reinforced by the following sequence of code:

mov bp, some_constant
mov sp, bp

This seems to be a characteristic sequence of code found in this tutorial and the University of Birmingham tutorial, as well tutorials and OS projects based on these two (which can often be used as a canary in the coal mine for all the commonly found issues). Not only is it completely unnecessary to set bp (you can just set sp directly) if it is never going to be used again, but it disregards segmentation. Since there's no guarantees for the initial value of any of the segment registers, ss needs to be properly set up first. This is because since real-mode interrupts use the stack, and never switch the stack, you must have a valid ss:sp at all times. This means you need to update both atomically. There is a special interrupt shadowing after writing to ss, so you can write to sp right after, without risk of an interrupt happening (alternatively you can just use the lss instruction). So the mov to sp should happen immediately after setting ss! For example, this can be done as follows:

xor ax, ax
mov ss, ax
mov sp, some_constant

Lesson 5

The familiar '\n' is actually two bytes, the newline char 0x0A and a carriage return 0x0D.

This is not true. '\n' actually corresponds to the character 0x0A (ASCII control character LF, or line feed). UNIX usually terminates lines with that single control character. Under DOS and Windows, 0x0D (ASCII carriage return, or CR) followed by 0x0A, or '\r\n', is more common (i.e. CR LF instead of just LF). In this case, the BIOS uses the CR LF convention (i.e. LF doesn't reset the horizontal position of the cursor).

This misconception may arise due to a misunderstanding of C and stdio.h facilities. The reason '\n' actually outputs '\r\n' on DOS/Windows-based systems and similar is because of the so called "binary" flag, which is a property of FILE * objects. When this flag is not set, the C library will intercept '\n' writes and translate them to '\r\n' on the fly. This is done so UNIX-oriented programs expecting '\n' to emit a newline and carriage return at once (practically all of them) will work unmodified under MS-DOS and Windows systems and similar.

Lesson 8

32-bit mode allows us to use 32 bit registers and memory addressing, protected memory, virtual memory and other advantages, but we will lose BIOS interrupts and we'll need to code the GDT (more on this later)

What the author probably means is 32-bit protected mode. The usage of the term "32-bit mode" is problematic. There exists both 16-bit and 32-bit protected mode, depending on the current code segment (the bits 32/use32 and bits 16/use16 directives just tell the assembler under which kind of code segment the code is meant to be executed on). So this means that there is a 16-bit mode where you can have protected and virtual memory. With some trickery, you can even have a 32-bit code segment in real mode. Regardless of that, you can still use 32-bit registers and addressing in 16-bit code (including in regular real mode) and 16-bit registers and addressing in 32-bit code. The bitness of the code segment merely changes the default. The real reason the code from lesson 8 won't work in real mode is that it uses an offset above 64 kiB (namely 0xB8000), which is the limit of real mode segments (this limit can be raised to 4 GiB by entering unreal mode, in which case the code would work just fine).

Lesson 9

As a curiosity, the first GDT entry must be 0x00 to make sure that the programmer didn't make any mistakes managing addresses.

This isn't actually the case. The first GDT entry (descriptor) can contain any data and does not need to be zeroed out. The contents are ignored, and the CPU throws a general protection fault when trying to access memory through a null selector. Also, there's multiple architectural uses for the null selector. For example, when certain segment registers have a lower DPL (descriptor privilege level) than the new CPL (current privilege level) during a far return resulting in a change in CPL (which, in the case of a far return, is always towards a less privileged ring), these registers are set to null. Similarly, ds, es, fs, and gs are cleared when leaving virtual 8086 mode (since the meaning of the selectors changes between virtual 8086 mode and regular protected mode).

Lesson 18

The Interrupt Service Routines run every time the CPU detects an interrupt, which is usually fatal.

This is not the case. CPU exceptions are usually not fatal, and are a fundamental mechanism by which OSes operate. Here and in lesson 19, a bizarre dichotomy between ISRs and IRQs is established. An interrupt service routine is the code invoked after any kind of interrupt, including hardware interrupts. This is also one thing that seems to propagate from tutorial to tutorial to ill-fated project.

Other issues

Issues with the implementation

There are also many bugs and implicit limitations in the code itself, especially (but not only) in the bootloader. The most apparent one is that it assumes segments are initialized to zero, preventing the entire project from working on most real machines. This is particularly characterized by the first instruction being mov [BOOT_DRIVE], dl, which implicitly uses ds without initializing it. Similar to stack initialization, this is another "canary in the coal mine", which is also found in the Birmingham PDF. Since this text is already going on for too long, I will refer to an annotated version of the source code here someone else made. While it is admittedly snarky and/or nitpicky at times, it covers everything more thoroughly than I could ever hope to do here.

Target audience

While the README says "this course is a code tutorial aimed at people who are comfortable with low level computing", it is not clear what is meant by "low level computing". The concepts listed for people to Google seem to send contradictory messages in that regard. Surely low level computing includes being able to deal with raw pointers and C strings? Nonetheless "pointers" in lesson 3, and "strings" in lesson 5 are listed as concepts to be Googled (and not as implicitly assumed knowledge). On the other hand, in lesson 2, when int 10h is introduced, the explanation of what exactly an interrupt is, what the instruction does, and how the BIOS provides services, seems a bit scant. It either presumes that the reader is already familiar with x86 real-mode assembly and BIOS service routines, or that a couple of Google searches can clear it up. However, at least the provided search terms of "interrupts" and "CPU registers" are not enough. In my opinion, to fully understand what is going on, and to be able to write a boot sector in a correct and effective way, one must already be familiar with x86 real-mode assembly (which this tutorial clearly does not assume, since it explains certain, but not all, concepts involved, such as the stack and real-mode segmentation). In the case of the tutorial, one also has to be familiar with NASM syntax (there's no explanation of how exactly times 510-($-$$) db 0 works for example).

Inherent issues with boot sector tutorials

What I hope to show here is that writing an OS starting from a boot sector is hard, and requires intimate knowledge of both the architecture and the platform. In my opinion, this knowledge cannot be acquired through Google searches. A prior systematic learning process is required, before one can even begin. And once one has the background knowledge, one can learn to write boot sectors without the help of a tutorial. Thus any tutorial starting with the boot sector is problematic, and very hard to get right. Unfortunately such tutorials are very prevalent, and often misguide beginners to a path with a very steep and frustrating learning curve, which dissuades many from pursuing operating system development further. They also obscure the fact that there are alternative approaches to OSDev, such as using an existing, mature bootloader, which abstracts away architectural and platform-related minutiae of early initialization, and allows one to easily target both BIOS and UEFI without a large amount of extra effort. Additionally, one could argue about the choice of implementation of certain things when it comes to this bootloader, such as choosing the old int 13h functions, and not the newer LBA extensions. The kernel's use of VGA text mode, instead of a linear framebuffer, could also be questioned.

Big picture

The tutorial mainly covers these minutiae (in a flawed and scant way), and not much of the actual meat of kernel design. The reader is left with no sense of how a kernel actually operates. Most of the aspects of the implementation, such as the interface between bootloader and kernel, cannot be scaled up in a sane way. Including a "libc" in the kernel can be misleading. Many of the functions in there are not usually found in a typical userspace libc, or match any relevant specification. In particular, one shouldn't strive to implement an ISO C compliant libc inside the kernel, or implement certain functionality in a compliant way, since a lot of libc functions are simply not useful (or make much sense) in a kernel context. Including a rudimentary a shell in the kernel is useful for testing keyboard input and screen output, but implementing the shell in the kernel isn't typically productive in the long-term (although one could have a small diagnostics shell in the kernel, but the main command line the user would be exposed to typically is a userspace program). The bump allocator is also not terribly useful in the long term. Kernel memory management has a lot more layers than just malloc.

Concluding remarks

Thus, I ask for the many outright bugs and errors to be fixed, or for the tutorial to be removed from such a prominent public position, given its excellent SEO, or, at the very least, for there to be a disclaimer to anyone who stumbles upon it (for example, by linking to this very issue). In addition, it would be helpful, in my opinion, if the required knowledge was clarified, alternative approaches were mentioned, and resources explaining theory were linked. Seeing beginners falling into the same pitfalls over and over again is severely disheartening and damaging for the OSDev community.

Special thanks for the valuable contributions and feedback provided by: @mintsuki, @pitust, @DeanoBurrito, @netbsduser, @no92, @qookei

Co-signed by: @lukflug, @mintsuki, @no92, @ElectrodeYT, @pitust, @DeanoBurrito, @Dennisbonke, @streaksu, @portasynthinca3, @xvanc, @solar-mist, @thomtl, @qookei, @hyenasky

Any remaining mistakes found here are my own.

Hello. Somebody wrote to me to let me know of this issue. Let me go straight to the point: I understand the sentiment of this letter. I agree to the proposal and I have linked to this issue in the README.

That being said, I think some of the hostility in this letter is totally unwarranted. I feel like I need to say a few words for those who are interested in an update from the author of the tutorial. It has no technical interest, so if you came here from the README, you can stop reading.

1/ The goal of this repo is plainly stated on the first paragraphs of the README

This is not a university course, it is the work of a hobbyist from nine years ago. A hobbyist who wanted to learn some basic concepts, who was following yet another tutorial, and who decided to upload his findings to the internet in case it can be helpful to other people. Any other interpretation of what this repo is, is a mistake. Please read the README again.

2/ Most of the criticisms in the open letter are valid. There are two categories of arguments that, in my opinion, are not correct:

  • Design decisions: "[the tutorial does not cover] much of the actual meat of kernel design" / "one shouldn't strive to implement an ISO C compliant libc inside the kernel" โ†’ This tutorial follows Blundell's document and does not explore alternatives. This is a deliberate decision stated in the README.

  • Lack of clarity about the scope of the tutorial and alternatives: "It would be helpful, in my opinion, if [...], alternative approaches were mentioned" / "this knowledge cannot be acquired through Google searches" โ†’ Literally the first section of the README links to the OSDev wiki and the little book about OS development.

3/ About the general sentiment that this tutorial is not good enough and therefore should be removed.

I agree without any sarcasm to the first part. It is old, abandoned, and I did it as a learning project. What I don't agree with is the second part.

During the time that this repo has been available I have received many messages from people who said that it was a good complement to college courses, that it helped them understand some concepts, or that it was just a fun tutorial to follow. I am happy that it brings joy to people. And that is what this is, with its flaws. This repo, as it is right now, it brings value. Some people may not agree with that argument, then we will need to agree to disagree. Therefore, I think the right decision to do is to let people discuss issues and to link to this letter from the README instead of creating more dead links.

4/ Despite the disagreement, a word of thanks

One of the most important lessons I have learned in my life is that criticism, especially that you disagree with, is a sign that people care about your work and want it to be better. To that I want to say thanks. Thanks for taking the time to write this letter and pointing out the flaws in my work. I am sorry that I cannot make it substantially better, as I don't have the time for that. Let this be a historical artifact from 2014, like Blundell's tutorial, and others from the era, with all its caveats, warnings, and disclaimers.

@cfenollosa have you considered archiving the repository on top of this, using GitHub's archival button in settings, so a banner is shown saying that the repository is old and archived?

This is a totally reversible action and I believe it may further drive home the point that if you follow this tutorial, you're on your own.

Thanks.

@cfenollosa Thank you very much for adding the disclaimer and for taking the time and writing a response. I appreciate it very much. I know the original criticism was harsh, but as you pointed out in your response, it's because we "care about your work and want it to be better". In particular, a number of people who helped me write this issue and co-signed it, used this tutorial as a starting point.

Hi All, to dip my toe into this conversation. I also have been working on my own OS. I started out by working through the Bare Bones, and Meaty Skeleton tutorials from OSDEV.org. I pulled in many resources from the internet; but eventually referenced a large part from Carlos' work. Something about his work appealed to me as a beginner. I know that what I have so far could hardly be called a true OS; but it works and has encouraged me to get this far, and possibly go back with a more serious approach. I was considering putting my work to date up on Git because I thought it may help someone else with some issue; but after reading all this I'm reconsidering.
I realise there may be issues with these 'beginners' works, but Carlos gets a big THANKS from me.

Not just good SEO, but there is also a clueless youtuber reccomending this tutorial:

https://www.youtube.com/watch?v=C-pjKgNlzeQ

I read the entire discussion and understood it. Having said that, what would be the best way to learn about the process of programming a OS, from scratch? Be it links, pdfs or any GitHub tutorials. (I already researched about this but did not succeed in finding much. Also, the university I'm enrolled does not provide this knowledge, as of yet.)

srmo commented

@LeonardoWrk probably here https://wiki.osdev.org/Babystep1 and the rest of the Wiki. It's rather formal in tone and might feel intimidating. They have several sections for beginners and even implement a "difficulty" system that should help you find your way. I'm switchging between the Wiki, https://www.cs.bham.ac.uk/~exr/lectures/opsys/10_11/lectures/os-dev.pdf (which sadly never was completed and seems wrong in some places) and other sources.
Good luck and have fun!

I read the entire discussion and understood it. Having said that, what would be the best way to learn about the process of programming a OS, from scratch? Be it links, pdfs or any GitHub tutorials. (I already researched about this but did not succeed in finding much. Also, the university I'm enrolled does not provide this knowledge, as of yet.)

Hi LeonardoWrk, I pretty much agree with srmo. The https://wiki.osdev.org site is a great resource, but as he said 'it can be a bit intimidating'. cfenollosa's tutorial got me to a happy point sooner than I otherwise would have. I have an OS that boots, displays a splash screen, has a status bar that displays the time, and I can type some simple commands into. That being said, I have put my project on hold for now. If/when I do get back to it, I will be following the osdev.org examples to create a 'real' system.
Good luck with your project, OldCrank