This is a repo contains my approach to csapp self-study labs and notes for the course 15-213 lecture\reading.
Resource:
15-213/15-513 Introduction to Computer Systems
[TOC]
leaq Src, Dst
- Src is address mode expression
- Set Dst to address denoted by expression
It does not access the memory, just loads address to Dst.
Some Arithmetic Operations:
Information about currently executing program
- Temporary data: (
%rax
, ...) - Location of runtime stack: (
%rsp
) - Location of current code control point: (
%rip
, ...) - Status of recent test: (CF, ZF, SF, OF)
Notice: 4 bytes computations will set high 32 bits of registers to be zero, it is a rule. And 2 bytes computations do not have such rule.
Condition Codes (Implicit Setting)
CF set when Carry in add operations and Borrow in sub operations.
SetX instructions
SetX argument is always a low byte (%al
, %r8b
, etc.)
Conditional move instructions.
Test
instruction executes logical AND for two operands, then set flags by the result.
CF
and OF
should always set to 0.
The design choices of mechanisms in procedures make up the Application Binary Interface (ABI).
- Passing control
- Passing data
- Managing local data
Stack allocated in Frames: state for single procedure instantiation, stores
- Arguments
- Local variables
- Return pointer
Register Saving Conventions "Caller Saved" v.s. "Callee Saved"
Nested array vs Multi-level array
Fields ordered according to declaration order, even if a more compact ordering exists.
Compiler determines overall size and positions of fields.
For arrays of structures: no padding between array elements.
To save space, put large data types first.
Arguments passed in %xmm0
, %xmm1
, etc.
Result returned in %xmm0
, and all XMM registers are Caller Saved.
buffer overflow: exceed the memory size allocated for an array.
Mostly caused by unchecked lengths on string inputs.
Take care of string libary code:
gets
: get string.
strcpy
, strcat
: Copy strings of arbitrary length.
scanf
, fscanf
, sscanf
, when given %s
conversion specification.
The EOF \0
itself also occupies a char.
Use:
fgets
instead of gets
.
strncpy
instead of strcpy
.
Don't use scanf
with %s
conversion specification: use fgets
or use %ns
where n
is a suitable integer.
System-Level protections include:
- Randomized stack offsets: at start of program, allocate random amount of space on stack and shifts stack addresses for entire program.
- Non-executable memory: x86-64 added a way to mark regions of memory as not executable, programs will crash on jumping into any such region.
- Stack canaries: place special value ("canary") on stack just beyond buffer and check for corruption before exiting function.
for canary check:
usually use xor %fs:0x28,%rax
compare to canary.
Stack randomization and marking stack non-executable makes it hard to predict buffer location and to insert binary code.
Alternative strategy is to use existing code.
TODO
RAM (Random-Access Memory) is the main memory building block, basic storage unit is normally a cell (one bit per cell). System "main memory" comprises multiple RAM chips.
Two main varieties of RAM: SRAM (Static RAM), DRAM (Dynamic RAM).
DRAM: 1 transistor + 1 capacitor per bit, must refresh state periodically. SRAM: 6 transistor per bit, holds state indefinitely.
Some properties of hardware and software:
- Fast storage costs more per byte and has less capacity.
- The gap between CPU and main memory speed is widening (CPU is getting faster and faster).
- Well-written programs tend to exhibit good locality.
These properties lead to an approach for organizing memory and storage systems known as a memory hierarchy.
Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger slower device.
Why cache? speed gap of two near storage layer + locality of programs
placement policy and replacement policy.
Storage tech:
- Magnetic Disks
- Nonvolatile (Flash) Memory
Disk access time = seek time + rotational latency + transfer time
direct memory access (DMA)
Synchronous DRAM (SDRAM): uses a conventional clock signal instead of asynchronous control.
Double data-rate synchronous DRAM (DDR SDRAM):
- Double edge clocking sends two bits per cycle per pin
- Size of small perfetch buffer: DDR (2 bits), DDR2 (4 bits), DDR3 (8 bits), DDR4 (16bits), DDR5
If a line is evicted and dirty bit is set to 1, the entire block of
Why index using middle bits?
If use high bits indexing, then near addresses will be mapped into the same set.
Why 99% hits is twice as good as 97%?
97% hits: 1 cycle + 0.03 x 100 cycles = 4 cycles
99% hits: 1 cycle + 0.01 x 100 cycles = 2 cycles
This is why "miss rate" is used instead of "hit rate".
Read throughput (read bandwidth): number of bytes read from memory per second (MB/s)
No blocking:
Blocking:
Use largest block size
Why Linkers?
Modularity:
Programs can be written as a collection of smaller source files, rather than one monolithic mass.
Efficiency:
Time: separate compilation
Space: libraries
- static linking: executable files and running memory images contain only the library code they actually use.
- dynamic linking: during execution, single copy of library code can be shared across all executing processes.
What do linkers do?
Symbol resolution
symbols: global variables and functions.
void swap() {...} /* define symbol swap */
swap() /* reference symbol swap */
int *xp = &x; /* define symbol xp, reference x */
Symbol definitions are stored in object file (by assembler) in symbol table, which is an array of entries, each entry includes name, size and location of symbol.
Relocation
Merges separate code and data sections into single sections.
Relocates symbols from their relative locations in the .o
files to their final absolute memory locations in the executable.
Updates all references to these symbols to reflect their new positions.
Local non-static C variables stored on the stack.
Local static C variables stored in either .bss
or .data
.
Create local symbols in the symbol table with unique names, e.g., x
, x.1721
and x.1724
.
Strong symbols: procedures and initialized globals.
Weak symbols: uninitialized globals or ones declared with specifier extern
.
example of extern
in .h
files
Relocation:
TODO: relocation rules
Linear address space: Ordered set of contiguous non-negative integer addresses:
Virtual address space: Set of
Physical address space: Set of
Why virtual memory?
- uses main memory efficiently (use DRAM as a cache for parts of a virtual address space).
- simplifies memory management: each process gets the same uniform linear address space.
- isolate address space.
TODO: lec15
TODO: lec16
TODO: lec17
Definition: A process is an instance of a running program.
Process provides each program with two key abstractions:
- Private address space
- Logical control flow
From startup to shutdown, each CPU core simply reads and executes a sequence of machine instructions, one at a time, this sequence is the CPU's control flow.
Control flow passes from one process to another via a context switch.
syscall example:
- read/write files
- get current time
- allocate RAM (sbrk)
Almost all system-level operations can fail. On error, most system-level functions retuan -1 and set global variable errno
to indicate cause.
Obtaining process IDs:
pid_t getpid(void); // return PID of current process
pid_t getppid(void); // return PID of parent process
Process states:
- Running: either executing or could be executing if there were enough CPU cores.
- Blocked/Sleeping: cannot execute until some external event happens (usually I/O).
- Stopped: has been prevented from executing by user action (Ctrl + Z).
- Terminated/Zombie: process is finished and parent process has not yet been notified.
Parent process creates a new runnig child process by calling fork.
int fork(void);
returns 0 to the child process, child's PID to parent process.
called once but returns twice.
If child process exits without notifying the parent process, then it becomes a "zombie", make sure wait
or waitpid
has been used to terminate child.
关于 waitpid
:
wait(&status)
is equivalent to waitpid(-1, &status, 0)
.
A shell is an application program that runs programs on behalf of the user
Up to now: two mechanisms for changing control flow:
- jumps and branches
- call and return
react to changes in program state
Insufficient for a useful system: difficult to react to changes in system state
- data arrives from a disk or a network adapter
- instruction divides by zero
- user hits Ctrl-C at the keyboard
- system timer expires
That is why system needs mechanisms for "exceptional control flow".
An exception is a transfer of control to OS kernel in response to some event.
- Kernel is the memory-resident part of the OS
- Examples of events: divide by 0, arithmetic overflow, page fault, I/O request completes, typing Ctrl-C
there is an exception table in memory
- Asynchronous ECF
- Interrupts
- Synchronous ECF
- Traps
- Faults
- Aborts
ECF exists at all levels of a system
- exceptions: hardware and operating system kernel software
- process context switch: hardware timer and kernel software
- signals: kernel software and application software
- nonlocal jumps: application code
The kernel will interrupt regular processing to alert us when a background process completes. In Unix, the alert mechanism is called a signal.
Kernel sends a signal to a destination process by updating some state in the context of the destination process.
Kernel maintains pending
and blocked
bit vectors in the context of each process
pending
: represents the set of pending signals- Kernel sets bit k in
pending
when a signal of type k is sent - Kernel clears bit k in
pending
when a signal of type k is received
- Kernel sets bit k in
blocked
: represents the set of blocked signals- Can be set and cleared by using the
sigprocmask
function - Also referred to as the
signal mask
- Can be set and cleared by using the
Typing ctrl-c (ctrl-z) causes the kernel to send a SIGINT (SIGTSTP) to every job in the foreground process group
- SIGINT - default action is to terminate each process
- SIGTSTP - default action is to stop (suspend) each process
Kernel computes pnb = pending & ~blocked
Each signal type has a predefined default action, which is one of:
- The process terminates
- The process stops until restarted by a SIGCONT signal
- The process ignores the signal
进程可以通过使用 signal 函数修改和信号相关联的默认行为。唯一的例外是 SIGSTOP 和 SIGKILL,它们的默认行为是不能修改的。
隐私阻塞机制:主程序捕获到信号 s,该信号会中断主程序,将控制转移到处理程序 S。S 在运行时,程序捕获信号 t≠s,该信号会中断 S,控制转移到处理程序 T。当 T 返回时,S 从它被中断的地方继续执行。最后,S 返回,控制传送回主程序,主程序从它被中断的地方继续执行。
显式阻塞机制:应用程序可以使用 sigprocmask 函数和它的辅助函数,明确地阻塞和解除阻塞选定的信号。
#include <signal.h>
int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);
int sigemptyset(sigset_t *set);
int sigfillset(sigset_t *set);
int sigaddset(sigset_t *set, int signum);
int sigdelset(sigset_t *set, int signum);
//返回;如果成功则为 0,若出错则为 -1。
int sigismember(const sigset_t *set, int signum);
// 返回:若 signum 是 set 的成员则为 1,如果不是则为 0,若出错则为 -1。
sigprocmask 函数改变当前阻塞的信号集合。具体的行为依赖于 how 的值:
- SIG_BLOCK: 把 set 中的信号添加到 blocked 中 (blocked = blocked | set)。
- SIG_UNBLOCK: 把 blocked 中删除 set 中的信号 (blocked = blocked & ~set)。
- SIG_SETMASK: block = set。
signal 处理流程
Function is async-signal-safe if either reentrant or non-interruptible by signals.
on the list:
_exit
, write
, wait
, waitpid
, sleep
, kill
not on the list:
printf
, sprintf
, malloc
, exit
信号的一个与直觉不符的方面是未处理的信号是不排队的。因为 pending
位向量中每种类型的信号只对应有一位,所以每种类型最多只能有一个未处理的信号。因此,如果两个类型 k 的信号发送给一个目的进程,而因为目的进程当前正在执行信号 k 的处理程序,所以信号 k 被阻塞了,那么第二个信号就简单地被丢弃了,它不会排队。
正确使用 sigprocmask
函数和它的辅助函数消除竞争
Explicitly waiting for signals
use sigsuspend
:
int sigsuspend(const sigset_t *mask);
// Equivalent to atomic version of:
sigprocmask(SIG_SETMASK, &mask, &prev);
pause();
sigprocmask(SIG_SETMASK, &prev, NULL);
TODO: examples
一些位操作的魔法:
/*
* if x = 0, then y = 0,
* otherwise y = 0x1.
*/
int y = !!x;
/*
* y = 0 -> z = 0;
* y = 0x1 -> z = 0xffffffff;
* use this to generate mask!
*/
int z = ~y + 1;
区分 -1
和 t_max
, t_min
:
对于 t_max
和 t_min