Author: Akiyama
Create : 13 Feb, 2022
Update: 28 Sep, 2022
First part of Operating System: The Three Easy Pieces
$T_{turnaround} = T_{complection} - T_{arrival}$ $T_{response} = T_{firstrun} - T_{arrival}$
- Bad in both Response Time and Turnaround TIme
- Non-preemptive scheduler
- Non-preemptive scheduler
- Bad in Response Time
- Improved in Turnaround Time
- Preemptive scheduler
- Much improved in Turnaround Time
- Bad in Response Time
- Good in Reponse Time.
- Less the Time Slice, better the Response Time
- Bad in Turnaroung Time
- Good Turnaround Time
- Good Response Time
Unfairness Metric : $ U = Time_{A} / Time_{B}$ , where $ U = 1 $ will get the best fairness
- Only be precise in large job scale, because we use randomness
- Use List structure for implementation
-
Avoid randomly unfairness even in small number of jobs
-
Pick the process with lowest pass value. , run it with given stride of time, and increase its pass value, and push it back to queue finally
-
It have global state , so it's worse than Stride Scheduling in some way
- Pick process with lowest vruntime, CFS use
sched_latency
/ number of process, to determine the time of the process to run, but it cannot be lower thanmin_granularity
- Introduced Weighting to calculate vruntime
- Use RB-Tree to store running processes
- Isolation: even some kernal are divided into serveral microkernal
- Transparency: cannot be deteceted from process that its memory adrdress is virtual
- Efficiency: on time and space
-
Any address A programmer can print out is a virtual address
-
Allocation and deallocation is implicitly managed by compiler
-
malloc()
usebrk
andsbrk
system calls to implement itself
#include <stdlib.h>
...
void *malloc(size_t size);
sizeof()
is a compile-time operator, meaning that the actual size is know at compile time
Think it as an operator, but not a function call(which would take place at run time.
To allocate space for a double-precision floating point value, you simply do this:
double *d = (double *) malloc(sizeof(double));
Another place to be careful is with strings. When declaring space for a string, use the following:
malloc(strlen(s) + 1) //adds 1 to it in order to make room for the end-of-string character
It returns a void *p
, then the programmer do the cast to tell the compiler to do the right thing
int *x = malloc(10 * sizeof(int));
...
free(x);
-
Forgetting to free memory
-
Freeing memory before you're done with it
-
Freeing memory repeatly
-
Call
free()
with wrong argument
In early days, we use software called loader takes an executable, rewrite its all address by adding desired offset. It does not have protection, and hardware support is needed for real protection!
-
Required hardware registers: base register + bounds(also called limit) register, they are called memory management unit(MMU)
-
Translation: $ address_{physical} = address_{virtual} + base $
-
Happens at run time, that's why it's called "dynamic"
Hardware Requirements | Notes |
---|---|
Privileged mode | Needed to prevent user-mode processes from executing privileged operations |
Base/bounds registers | Need pair of registers per CPU to support address translation and bounds checks |
Ability to translate virtual addresses and check if within bounds | Circuitry to do translations and check limits; in this case, quite simple |
Privileged instruction(s) to update base/bounds | OS must be able to set these values before letting a user program run |
Privileged instruction(s) to register + exception handlers | OS must be able to tell hardware what code to run if exception occurs |
Ability to raise exceptions | When processes try to access privileged instructions or out-of-bounds memory |
OS Requirements | Notes |
---|---|
Memory management | Need to allocate memory for new processes; Reclaim memory from terminated processes; Generally manage memory via free list |
Base/bounds management | Must set base/bounds properly upon context switch |
Exception handling | Code to run when exceptions arise; likely action is to terminate offending process |
Code, stack and heap size are smaller than the allocated unit, cause memory wasted
- 2 bits for segmentation, 12 bits for offset
// get top 2 bits of 14-bit VA
Segment = (VirtualAddress & SEG_MASK) >> SEG_SHIFT
// now get offset
Offset = VirtualAddress & OFFSET_MASK
if (Offset >= Bounds[Segment])
RaiseException(PROTECTION_FAULT)
else{
PhysAddr = Base[Segment] + Offset
Register = AccessMemory(PhysAddr)
}
- 1 bit to indicate if "grows positive"
- Protection bits to indicate if read / execute / write
The process’s memory request cannot be fulfilled because the memory are in many "little holes", which cause waste of memory
Second part of Operating System: The Three Easy Pieces
- Like a separate process, but share the same address space and can access same data
- Thread Control Blocks store the sate of each thread of a process, when context switch occurs
- Each thread has a stack, but it will be small
- Fairness
- Correct Mutual Exclusion
- Performance
1 void lock() {
2 DisableInterrupts();
3 }
4 void unlock() {
5 EnableInterrupts();
6 }
Drawbacks:
-
Do not support multiple-processors
-
Need give priviledge to dreedy program, they may call lock() forever
-
Lost data when "turn off interupts", e.g. Disk read data and ready to interupt to store them
-
Low performance for CPU to "turn on/ off interrupts"
Advantage:
- It's ok for OS itself to do so in some cases, no security problem
Atomic exchange instruction from hardware to build mutual exclusion primitive:
1 int TestAndSet(int *old_ptr, int new) {
2 int old = *old_ptr; // fetch old value at old_ptr
3 *old_ptr = new; // store 'new' into old_ptr
4 return old; // return the old value
5 }
This instruction is supportted by almost all systems now
A simple spin lock
1 typedef struct lock_t {
2 int flag;
3 } lock_t;
4
5 void init(lock_t *lock) {
6 // 0 indicates that lock is available, 1 that it is held
7 lock->flag = 0;
8 }
9
10 void lock(lock_t *lock) {
11 while (TestAndSet(&lock->flag, 1) == 1)
12 ; // spin-wait (do nothing)
13 }
14
15 void unlock(lock_t *lock) {
16 lock->flag = 0;
17 }
- Correct on mutual exclusion
- No fairness
- May result bad performance: 1 thread get lock, N - 1 threads spin for a tick to wait for lock
Instruction provided by SPARC and x86
1 int CompareAndSwap(int *ptr, int expected, int new) {
2 int actual = *ptr;
3 if (actual == expected)
4 *ptr = new;
5 return actual;
6 }
and replace lock()
function above
1 void lock(lock_t *lock) {
2 while (CompareAndSwap(&lock->flag, 0, 1) == 1)
3 ; // spin
4 }
Test-And-Set : assign value of *prt
in each call
Compare-And-Swap : assgin value of *prt
only for 1 time
Instruction provided by ARM, PowerPC, Alpha
1 int LoadLinked(int *ptr) {
2 return *ptr;
3 }
4
5 int StoreConditional(int *ptr, int value) {
6 if (no one has updated *ptr since the LoadLinked to this address) {
7 *ptr = value;
8 return 1; // success!
9 } else {
10 return 0; // failed to update
11 }
12 }
Lock Implemenation:
1 void lock(lock_t *lock) {
2 while (1) {
3 while (LoadLinked(&lock->flag) == 1)
4 ; // spin until it's zero
5 if (StoreConditional(&lock->flag, 1) == 1)
6 return; // if set-it-to-1 was a success: all done
7 // otherwise: try it all over again
8 }
9 }
10
11 void unlock(lock_t *lock) {
12 lock->flag = 0;
13 }
Intruction by Mellor-Crummey & Michael Scott
1 int FetchAndAdd(int *ptr) {
2 int old = *ptr;
3 *ptr = old + 1;
4 return old;
5 }
Lock Implementation:
1 typedef struct lock_t {
2 int ticket;
3 int turn;
4 } lock_t;
5
6 void lock_init(lock_t *lock) {
7 lock->ticket = 0;
8 lock->turn = 0;
9 }
10
11 void lock(lock_t *lock) {
12 int myturn = FetchAndAdd(&lock->ticket);
13 while (lock->turn != myturn)
14 ; // spin
15 }
16
17 void unlock(lock_t *lock) {
18 FetchAndAdd(&lock->turn);
19 }
Advantage: it ensures all thread which get the ticket can be scheduled, imporved starvtation problem
Neither spin or yield cannot prevent waste of CPU or starving , Solaris provide park()
and unpark()
to implement
-
Improve startvation by using queue
-
Save CPU waste from freqeut context swtich by make the thread to sleep
-
Implementation highlight: It set lock to flag part in
lock()
function
1 typedef struct lock_t {
2 int flag;
3 int guard;
4 queue_t *q;
5 } lock_t;
6
7 void lock_init(lock_t *m) {
8 m->flag = 0;
9 m->guard = 0;
10 queue_init(m->q);
11 }
12
13 void lock(lock_t *m) {
14 while (TestAndSet(&m->guard, 1) == 1)
15 ; //acquire guard lock by spinning
16 if (m->flag == 0) {
17 m->flag = 1; // lock is acquired
18 m->guard = 0;
19 } else {
20 queue_add(m->q, gettid());
21 m->guard = 0;
22 park();
23 }
24 }
25
26 void unlock(lock_t *m) {
27 while (TestAndSet(&m->guard, 1) == 1)
28 ; //acquire guard lock by spinning
29 if (queue_empty(m->q))
30 m->flag = 0; // let go of lock; no one wants it
31 else
32 unpark(queue_remove(m->q)); // hold lock (for next thread!)
33 m->guard = 0;
34 }
Linux use futex
interface to implement similar things
What if threads want to check whether a condition is true before continuing its execution? E.g. a parent thread want to wait its child thread to be done.
We introduce condition variables:
- It's a queue
- Thread can add them into it when condition is not satified
- Other thread can change the condition, can wake thread waiting on the condtion by sending signal
You can declare a condition variable :pthread_cond_t c;
There' re two operation wait()
and signal()
for condition varaible:
pthread_cond_wait(pthread_cond_t *c, pthread_mutex_t *m);
pthread_cond_signal(pthread_cond_t *c);
POSIX calls look like:
1 int done = 0;
2 pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
3 pthread_cond_t c = PTHREAD_COND_INITIALIZER;
4
5 void thr_exit() {
6 Pthread_mutex_lock(&m);
7 done = 1;
8 Pthread_cond_signal(&c);
9 Pthread_mutex_unlock(&m);
10 }
11
12 void *child(void *arg) {
13 printf("child\n");
14 thr_exit();
15 return NULL;
16 }
17
18 void thr_join() {
19 Pthread_mutex_lock(&m);
20 while (done == 0)
21 Pthread_cond_wait(&c, &m);
22 Pthread_mutex_unlock(&m);
23 }
24
25 int main(int argc, char *argv[]) {
26 printf("parent: begin\n");
27 pthread_t p;
28 Pthread_create(&p, NULL, child, NULL);
29 thr_join();
30 printf("parent: end\n");
31 return 0;
32 }
-
wait()
will assume thread is holding the lockmutex
when it's called, then releasemutex
-
State variable
done
is the key that stores the information that the threads want to know. Like if it's the right time for parent thread towait()
in this example. -
Always hold the lock when sending
signal()
to prevent race condition, and stay healthy :)
- Mesa semantics: refer to "sending signal to thread only means condition changed, but no gaurantee it's the expect value before the thread's execution "
- Hoare semantics: The contrast of Mesa semanics
Thus, we always use while(condition)
to replace if(condition)
to check if condition is expected under Mesa semantics (because all computer systems are built under Mesa semantics)
-
2 condtions:
empty
is the queue where producers sleep;fill
is the queue whereconsumer
sleep and wait to consume. Whencount == 0
which means no data in buffer, then thread send signal toempty
to wake up producer. Vice Versa. -
Why we have 2 condtions? because consumer should only wake up producer, vice versa. Condition conceptually is like a queue, and decide which specific thread should wake up another specific thread
put()
and get()
functions for data access:
1 int buffer[MAX];
2 int fill = 0;
3 int use = 0;
4 int count = 0;
5
6 void put(int value) {
7 buffer[fill] = value;
8 fill = (fill + 1) % MAX;
9 count++;
10 }
11
12 int get() {
13 int tmp = buffer[use];
14 use = (use + 1) % MAX;
15 count--;
16 return tmp;
17 }
Final Impelementation:
1 cond_t empty, fill;
2 mutex_t mutex;
3
4 void *producer(void *arg) {
5 int i;
6 for (i = 0; i < loops; i++) {
7 Pthread_mutex_lock(&mutex); // p1
8 while (count == MAX) // p2
9 Pthread_cond_wait(&empty, &mutex); // p3
10 put(i); // p4
11 Pthread_cond_signal(&fill); // p5
12 Pthread_mutex_unlock(&mutex); // p6
13 }
14 }
15
16 void *consumer(void *arg) {
17 int i;
18 for (i = 0; i < loops; i++) {
19 Pthread_mutex_lock(&mutex); // c1
20 while (count == 0) // c2
21 Pthread_cond_wait(&fill, &mutex); // c3
22 int tmp = get(); // c4
23 Pthread_cond_signal(&empty); // c5
24 Pthread_mutex_unlock(&mutex); // c6
25 printf("%d\n", tmp);
26 }
27 }
Lampson and Redell use only 1 condition in producer/consumer problem, but they wake up all sleeping threads to check which one is need. If a thread is not need, it go back to sleep imediately.
It's the single primitive for all things related to synchronization. We can use it as both locks and condtion varables. Invited by Edsger Dijkstra
sem_wait()
sem_post()
make the process to sleep/waken
1 #include <semaphore.h>
2 sem_t s;
3 sem_init(&s, 0, 1);
Methods definition
1 int sem_wait(sem_t *s) {
2 decrement the value of semaphore s by one
3 wait if value of semaphore s is negative
4 }
5
6 int sem_post(sem_t *s) {
7 increment the value of semaphore s by one
8 if there are one or more threads waiting, wake one
9 }
set semaphore initial value to be 1
1 sem_t m;
2 sem_init(&m, 0, X); // initialize semaphore to X; what should X be?
3
4 sem_wait(&m);
5 // critical section here
6 sem_post(&m);
set semaphore initial value to be 0
1 sem_t s;
2
3 void *
4 child(void *arg) {
5 printf("child\n");
6 sem_post(&s); // signal here: child is done
7 return NULL;
8 }
9
10 int
11 main(int argc, char *argv[]) {
12 sem_init(&s, 0, X); // what should X be?
13 printf("parent: begin\n");
14 pthread_t c;
15 Pthread_create(c, NULL, child, NULL);
16 sem_wait(&s); // wait here for child
17 printf("parent: end\n");
18 return 0;
19 }
While (STATUS == BUSY)
; // wait until device is not busy
Write data to DATA register
Write command to COMMAND register
(Doing so starts the device and executes the command)
While (STATUS == BUSY)
; // wait until device is done with your request
Simple but waste CPU time.
Send a request to device, then make the process sleep. When device finished its internal operation, device through a hardware interrupt. OS will handle with Interrupt Service Routine,ISR or interrupt handler.
However, intterput is not always optimal, if you have a high-performance device that do data operation very fast. Interrupt will make it slow.
Use polling and interrupt: polling for a little while, if device stilling doing the data operation, then interrupt.
Special device that help CPU to copy the device.
- Special instruction: priviledge instruction that can only performed by OS
- Memory-Mapped I/O, use memory and device as its address space
-
Provice abstraction, OS do not need how to operate device detail. The driver will do the detail
-
May contain more bugs, as the driver developers are generally not expert
-
Take a high percentage of OS size.
-
Assure atomic write of 512 bytes
-
Performance follows space and temporal locality
-
Platter: covered with magnetic film
-
Surface: double sides of a Platter
-
Spindle: mutiple platters construct a spindle
-
RPM(Rorations Per Minute): spindle spins in 7200~15000RPM generally
-
Track: surfaces that divided into rings make a track
-
Cylinder: stack of tracks(across platters)
-
Disk head: change magnetic film
-
Disk arm: move disk head to the expected location
- Track skew: sections are generally skewed(not aligned), to reduce rotation delay
- Mutile-zoned: outer tracks contains more sections
- Track buffer/Cache: 8/16MB
T_I/O = T_seek + T_rorate + T_transfer
principle of SJF,shortest job first
Implementation:
-
Shortest-Seek-Time-First, SSTF: simple and greedy
-
SCAN: prevent starving
-
Shortest Positioning Time First, SPTF: use physical layout of disk(consider rotation)
- Reliability
- Performance
- Capacity
- Full performance
- Low reliability
- Full performance on random read
- Half performance on other
- Half capacity
- Low performance(1 / 2) on random write
- Almost full performance (N -1) on other
- N - 1 capacity
- Fine performance (N / 4) on random write
- Other almost same like RAID-4
2 filename in directory pointing to the same inode(which means a real file)
Itself is a third type of file. Delete the source file may cause dangling reference
-
Use
4KB
block size(increase data transforming efficiency) -
Each
group
is organized likeVSFS
(Very Simple File System) -
Place file data in the same
group
-
Place files in the same directory together
-
The Large-file Exception : 4MB data in
indirect block
will be placed in differentgroup
a tool on Unix to check metadata consistency
-
data journaling
-
metadata journaling