Building text-editor from scratch in order to understand how does a text-editor work.

https://asciinema.org/a/wGQ9u09MwglWub59mJp6UY5Jv.svg

Read

https://viewsourcecode.org/snaptoken/kilo/index.html

Usage

make
./zsxh <filename>

Keys:

Ctrl-S: Save
Ctrl-Q: Quit
Ctrl-F: Find

Note

1. ASCII

http://www.asciitable.com/

# get keypress scancode
showkey -a

ASCII codes 0-31 are control characters, and 127 is also a control character.(nonprintable characters)
ASCII codes 32-126 are all printable.

  • escape sequence
    Arrow keys, Page Up, Page Down, Home, End … all input 3 or 4 bytes to the terminal: 27, '[' , and then one or two other characters. All escape sequence start with a 27 byte. Press escape sends a single 27 byte as input.
  • Backspace is byte 127
  • Enter is byte 10, which is a newline character, also known as '\n'
  • Ctrl-A is 1, Ctrl-B is 2, …
  • Ctrl-S program stop sending you output Ctrl-Q resume sending you output
  • Ctrl-Z suspend to the background Run fg command to bring it back to the foreground

Ctrl-A is 1, Ctrl-B is 2, Ctrl-C is 3, … Ctrl-J is 10, Ctrl-K is 11, Ctrl-L is 12, Ctrl-M is 10 …,

Q: Since Ctrl-J is 10, then why Ctrl-M is also 10 ?
A: You may find that Enter key also produces 10, that’s because the terminal is helpfully translating any carriage returns (13, ‘\r’) inputted by the user into newlines (10, ‘\n’).

Turn off ICRNL(comes from <termios.h>, the I stands for “input flag”, CR stands for “carriage return”, and NL stands for “new line”), Ctrl-M will be read as a 13, and the Enter key is also read as a 13.
It turns out that the terminal does a similar translation on the output side. It translates each (“\n”) we print into a carriage return followed by a newline (“\r\n”). \ Turn off OPOST(comes form <termios.h>, O means it’s an output flag, and I assume POST stands for “post-processing of output”), you’ll see that the newline characters we’re printing are only moving the cursor down, and not to the left side of the screen.

2. Escape sequence

Escape sequence always start with an escape character(27 in decimal, or \x1b in bytes) followed by a [ character.
Escape sequences instruct the terminal to do various text formatting tasks, such as coloring text, moving the cursor around, and clearing parts of the screen. \ Escape sequence commands take arguments, which come before the command.

For example, we use J command(Erase In Display) to clear the screen: <esc>[2J would clear the entire screen, <esc>[1J would clear the screen up to where the cursor is, and <esc>[0J would clear the screen from the cursor up to the end of the screen. Also, 0 is the default argumnet for J, so just <esc>[J by itself would also clear the screen from the cursor to the end.
We will be mostly using VT100 escape sequences, which are supported very widely by modern terminal emulators.

Pressing Ctrl-[ is the same as pressing the Escape key, for the same reason that Ctrl-M is the same as pressing Enter : Ctrl clears the 6th and 7th bits of the character you type in combination with it. You can adjust the VTIME value in enableRawMode to change terminal read() time out, so if VTIME value is large enough, it will make pressing combinations easier.

The m command (Select Graphic Rendition) causes the text printed after it to be printed with various possible attributes including bold (1), underscore (4), blink (5), and inverted colors (7). For example, you could specify all of these attributes using the command <esc>[1;4;5;7m. An argument of 0 clears all attributes, and is the default argument, so we use <esc>[m to go back to normal text formatting. The VT100 User Guide doesn’t document color, so let’s turn to the Wikipedia article on ANSI escape codes.

3. Compiler

We have set -std=c11 -Wpedantic, which means that the normal defines are not available and POSIX function are not declared unless you explicitly set one of _POSIX_C_SOURCE or _XOPEN_SOURCE.
We need to define a feature test macro. We add them above our includes, because the header files we’re including use the macros to decide what features to expose.

/*** includes ***/

#define _DEFAULT_SOURCE
#define _BSD_SOURCE
#define _GNU_SOURCE

#include <stdio.h>

4. Rendering nonprintable characters

The length of a tab is up to the terminal being used and its settings.
Rendering nonprintable characters tabs, a ^ character followed by another character, such as ^A for the Ctrl-A character (this is a common way to display control characters in the terminal). \ We use iscntrl() to check if the current character is a control character. If so, we translate it into a printable character by adding its value to ‘@’ (in ASCII, the capital letters of the alphabet come after the @ character), or using the ‘?’ character if it’s not in the alphabetic range.

See Also

  • kilo - A text editor in less than 1000 LOC with syntax highlight and search.
  • kibi - A text editor in ≤1024 lines of code, written in Rust