- Rust provides cache-friendly data structures by default
- usually it's arrays that hold data within Rust programs, NOT nested tree structures created by pointers
- data-oriented programming
- Methods are always dispatched statically, unless explicitly requested to do so dynamically
- Utilities written in Rust are compiled as static binaries by default
- macros return code rather than value
- Rust does NOT have constructors
the convention of
newmethod implemented by types is not part of the Rust lang
- To compile a release build:
cargo run --release- release build is faster at runtime but slower to compile
- append
-qtocargoto run in quiet mode
-
default integer is
i32 -
default float is
f64 -
usizeandisize- pointer sized integers- assume CPU's native width
- 4 bytes on 32-bit arch
- 8 bytes on 64-bit arch
-
f32andf64- floating point -
0b- binary{:0b}
-
0x- hex{:0x}
-
0o- octal{:0o}
-
Conversions between types are always explicit
- use
as - or,
use std::convert::TryInto// try_into() returns an i32 wrapped in `Result` let num_ = num.try_into().unwrap();
- use
-
tuple -
(val, val, ...)- of different types
- 12 elements max?
- fixed sequences of values **on stack**
- to destructure:
let (a, b, c ...) = <some_tuple>;
-
array -
[val, val, ...]- collection of similar (same type) elements with fixed length known at compile time
- of type
[T;N] let arr = ['x'; 5];
-
slices - collection of similar elements with length known at runtime
-
as- numeric type conversions -
constants - ALL_CAP_SNAKE_CASE - must always have explicit types
-
::std::f64::INFINITY -
::std::f32::NAN -
booltakes an entire byte in memory -
charrepresented by single quotes''- unicode char
'\u{62}'
- unicode char
-
()is a unit - returning nothing- the
main()function by default returns()
- the
-
main()function can also return aResult
str- string slice - text with length known at runtime- cannot be expanded or shrank once created
- UTF-8 guaranteed
&str- contains reference tostrdata and a length- attempting to assign a variable to
strwill FAIL - Rust compiler wants to create fixed-sized variables within a function's stack frame
strvalues can be of arbitrary length - can only be stored as local variables by reference
Stringuses dynamic memory allocation- Creating
&stravoids a memory allocation Stringis an owned type- read-write
&stris a borrowed type- read-only
- string literals are
&str- full type:
&'static str
- full type:
charis ALWAYS 4 bytes- interanally encoded in UCS-4/UTF-32
- ALWAYS fixed-length - easier for the compiler
[u8]- slice of raw bytesVec<u8>- vector of raw bytesString<->Vec<u8>str<->[u8]- string formatting uses
{}as formatter std::ffi::OSString- platform-native string- not guaranteed to be UTF-8
- does NOT contain the zero byte
0x00
std::path::Path- filesys- a backslash
\escapes the newline character in the string literal Stringis guaranteed to be UTF-8- however reading a text file into
Stringmay cause errors if there are invalid bytes - Better - read in data as
[u8](slice ofu8values) then decode those bytes
- however reading a text file into
- Implemented in base 2, but often used in calculations in base 10
- In Rust,
f32andf64only implementstd::cmp::PartialEq- other numeric types also implement
std::cmp::Eq
- other numeric types also implement
- Guidelines:
- Avoid testing floating-point numbers for equality
- Be wary when results may be mathematically undefined
to_bits()- Safer to test within an acceptable margin, **epsilon**
f32::EPSILONandf64::EPSILONassert!(abs_diff <= f32::EPSILON);
NAN- Not A Number- NEVER equal to each other
- use
is_nan()andis_finite()- they crash
- array - fixed size and extremely light weight
- although possible to replace items within an array
- an array's type:
[T; n]
- vector - growable but incurs small runtime penalty
- In practice, most interactions with arrays occurs via another type, slice,
[T]- slice itself is interacted by reference
&[T]
- slice itself is interacted by reference
- slice - dynamically sized array-like objects
- size not known at compile time
- NOT resizable
- dynamic typing
[T; n]vs.[T]- create a slice from an array - cheap!
- can be used as a view on arrays (and other slices)
- size of slice itself is FIXED in memory: two
usizecomponents- pointer
- length
&[T]
- use tuple to return multiple values from a function
- to return nothing:
return ();-()is empty tuple - to print
(), use{:?}
'a,'b, etc- lifetime variables
i: &'a i32- binds lifetime variable'ato the lifetime ofiiis a reference to ani32with lifetimea
- All values bound to a given lifetime must live as long as the last access to any value bound to that lifetime
- Traits
fn add<T: std::ops::Add<Output = T>>(i: T, j: T) -> T {}- all Rust operations are defined within traits
- ALL Rust operations are systactic sugar for trait's methods
- During compilation,
a + bis converted toa.add(b)
for item in container-containerbecomes unavailable after theforloop ends- Three
forloops:for item in collection- Ownershipfor item in &collection- Read-onlyfor item in &mut collection- Read-write
n..m- exclusive rangen..=m- inclusive range- Try not to use index:
- performance:
collection[index]incurs runtime costs for bounds checking - safety
- performance:
loop- to loop forever- until a
breakor termination
- until a
- often for long-running servers
- loop label
- prefixed with
':
'outer: for x in 0.. { for y in 0.. { for z in 0.. { if x + y + z > 1000 { break 'outer; } } } }
- prefixed with
- Rust has no truth or falsey concepts
breakreturns a value - allows infinite loops to return valuesbreak 123;
match- safer?
- does NOT fall through - returns immediately when a match is found
- Use
clap - example usage:
cargo run -- <arg>./target/debug/<executable> --help
BufReader- provides buffered I/O
- can reduce system calls to OS if the hard disk is congested
#![allow(unused_variables)]#[allow(dead_code)]- a return type of
!indicates this function NEVER returns- when function is guaranteed to crash
panic!()- when there is an infinite loop
()- unit type- zero-length tuple
- the function returns no value
- expressions ended with
;returns()
#[derive(Debug)]- allows for printing for compound types
- works with
{:?} - the
Debugtrait is implemented for that type
String::from()generates owned strings from string literals, which are slices- accessing fields by reference prevents the use after move issues
- borrowing
- the
newtypepatternstruct SomeNewType(String);
Vec<T>::reserve(&mut self, addtional: uszie)- reservesadditionalmore elementsString::from_utf8_lossy(&buffer)- convertsVec<u8>toString- non-UTF8 bytes are replaced with
- every
structcan be instantiated through a literal syntax (with::new()) structfields are private by default, but can be accessed within the module that defines thestruct
-
In C:
errnoof-1- commonly accessed/modified by sys calls
-
In Rust:
static mut ERROR: i32 = 0;static- static lifetime valid for the life of the program
-
Accessing and modifying
static mutvariables requires the use of an unsafe blockunsafe { }
-
By convention. global variables use
ALL_CAPS -
constincluded for values that never change -
letvs.const- data behind
letcan change letrelates more to aliasing than immutability- aliasing: having multiple references to the same location in memory at the same time
- borrows (read-only references) to variables declared with
letcan alias the same data - mutable borrows (read-write references) are guaranteed to never alias data
- borrows (read-only references) to variables declared with
- data behind
Resulthas two states,OkErr
- requires an extra
unwrap()- will crash if unwrapping
Err
- will crash if unwrapping
- Calling
.unwrap()on aResultis poor style!
.splitn(X)- splits to at mostXpartsenumsupportsimpl
- Allowing multiple types to implement a certain
traitenables code reuse and allows the Rust compiler to perform zero-cost abstraction- by not adding extra data around values within
strcuts
- by not adding extra data around values within
println!,print!,write!,writeln!,format!- rely on
DisplayandDebugtraits {}
- rely on
- a
pub struct's fields remain private - an
pub enum's variants are public pub struct's methods should also be markedpub
///or//!- Use
cargo doc --opento generate doc for the project and open it in web browser - Use
cargo doc --no-depsto not include dependencies
- move - movement of ownership
- Every value in Rust is owned
- Types implementing
Copytrait are duplicated at times that would otherwise be illegal - primitive types possess copy semantics
- all other types have move semantics
- Owner cleans up when its values' lifetimes end
Drop- to implement custom destructor
drop(&mut self)
- Values cannot outlive their owner
- Use references where full ownership is not required
- Duplicate the value
- Refactor the code to reduce number of long-lived objects
- Wrap your data in a type designed to assist with movement issues
- Two modes of duplication: clone vs. copy, by different traits
std::clone::Clonestd::marker::Copy- copy is implicit - whenever ownership would otherwise be moved to an inner scope
- always fast and cheap
- always identical - copies are bit-for-bit duplicates
- clone is explicit -
.clone()- may be slow and expensive
- Why not always
Copy?- not for non-negligible performance impact
- not working perfectly on references
- some types overload the
Clonetrait
- If implementing
Copymanually, you would also need to implementClone
- Will incur runtime costs
std::rc::RcRc<T>- a reference-counted value of type
T - provides shared ownership of
T - prevents
Tfrom being removed from memory until every owner is removed - implements
Clone - does NOT allow mutation!
- to allow mutation, use
Rc<RefCell<T>>let mut mutable_base = <base>.borrow_mut()- allowing mutation
- could be an alternative if implementing
Cloneis prohibitively expensive - NOT thread safe!!!
- In multithreaded code, replace with
Arc<T>andArc<Mutex<T>>
- a reference-counted value of type
- Text files are just binary files that happen to follow a consistent mapping between bit strings and characters
- encoding
let a: f32 = 42.42;
let frankentype: u32 = unsafe {
std::mem::transmute(a)
};{:032b}- directive for theprintln!()macro- left-pad with 32 zeros
- right-hand
binvokes thestd::fmt::Binarytrait
{}invokesstd::fmt::Displaytrait{:?}invokesstd::fmt::Debugtraitstd::mem::transmute()- naively interpret anf32asu32without affecting any of the underlying bits
#[allow(arithmetic_overflow)]
fn main() {
let (a, b) = (200, 200);
let c: u8 = a + b;
println!("200 + 200 = {}", c);
}- code below compiles if specifying the
-Oflag:rustc -O- but gives the wrong answer, 144
- how bytes layout in CPU
- Big Endian - most significant on the left
- Little Endian - most significant on the right
- Integers are almost certainly stored in little endian
- Each floating-point number is laid out in memory as scientific notation
- Consists of:
- sign bit
- exponent
- mantissa
- radix, which is
2
$(-1)^{signbit} \times mantissa \times Radix^{exponent - bias}$ - allows for both
$0$ and$-0$ - To isolate the sign bit,
>> 31 - To isolate the exponent:
>> 23- AND mask
& 0xffto exclude the sign bit - Decode - subtract
127
- To isolate mantissa,
& 0x7fffff- calculate weight
- Useful for representing fractions
- An option for performing calc on CPUs w/o a floating point unit (FPU)
- microcontrollers
- Loses accuracy, saves significant space
- **The Q Format** - fixed-point number format using a single byte
- Q7
- 7 bits available for representing number and 1 bit for sign
i8
- tuple struct - struct created from unnamed fields
struct Q(i8);
PartialEq- can be compared using==Eq- any possible values of the type can be compared against any other possible values of the same typeimpl From<T> for Ustd::convert::Fromstd::convert::TryFrom
- Division is a slow operation
- CHIP-8
- An operation (op)
- procedures supported natively by the system
- implemented in hardware
- intrinsic operation
- Register
- containers of data that can be directly accessed by CPU
- for CHIP-8, each register is
u8
- opcode - a number that maps to an operation
- on CHIP-8, opcodes include both:
- operation
- operands' registers
- on CHIP-8, opcodes include both:
- Steps to perform addition:
- Initialize CPU
- Load
u8values into registers - Load the addition opcode into
current_operation - Perform the addition
- Emulator:
- Reads the opcode
- Decodes instruction
- Matches decoded instruction to known opcodes
- Dispatches execution of the operation to a specific function
- How to interpret CHIP-8 codes
u16values made up of 4 nibbles- nibble - half a byte
- Each opcode (
u16) is made up of two bytes:- high byte
- low byte
- Each byte is made up of two nibbles
- high nibble
- low nibble
high byte low byte
\/ \/ \/ \/
0 x 7 3 E E
^ ^
high nibble low nibble
- can execute several instructions in sequence
- 4K memory
- opcode of
0x0000indicates stopping the CPU - Limitations:
position_in_memoryis more formally referred to as program counter- CHIP-8 specifications reserves the first 512 bytes (
0x100) for the system
- Use the last register as a carry flag
- an operation has overflowed the
u8register size
- an operation has overflowed the
- Adds the ability to call functions
- NO programming language support
- any programs still need to be written in binary
- Support for a stack - a specialized memory to store the following addresses:
- the CALL opcode -
0x2nnnnnnis the memory address- sets
position_in_memorytonnn, the address of the function
- the RETURN opcode -
0x00EE- sets
position_in_memoryto the memory address of the previous CALL opcode
- sets
- the CALL opcode -
- Function: a sequence of bytes that can be executed by a CPU
copy_from_slice(&some_slice)- Calling a function:
- Store the current memory location on the stack
- Increment the stack pointer
- Set the current memory location to the intended memory address
- Returning from a function:
- Decrement the stack pointer
- Retrieve the calling memory address from the stack
- Set the current memory location to the intended memory address
cargo build --releasedisables runtime checks- integer overflows/underflows could happen!
- Most commonly referred to as
&Tand&mut T- also
Option<T>- using null pointer optimization to ensure thatOption<T>occupies 0 bytes in the compiled binary Noneis represented by a null pointer - pointing to invalid memory
- also
- address space - retrieval system to make use of physical RAM
- encoded in
usize
- encoded in
- pointer vs. Rust's reference
- references always refer to valid data
- references are correctly aligned to multiple of
usize- Rust may itself include padding bytes
- references are able to provide guarantees above ^ for dynamically sized types
- for types with no fixed width in memory, Rust ensures that a length is kept alongside the internal pointer
- Three types of pointer types in Rust:
- References
- Pointers
- Raw pointers
Cow- copy on write- handy when an external source provides a buffer
- avoids copies - increased runtime performance
ffi- foreign function interface
- raw pointer - memory address without Rust's standard guarantees
- unsafe
- could be null
- dereferencing must occur in
unsafeblock *const T*mut T- a raw pointer to
String:*const String - a raw pointer to
i32:*mut i32 - Rust references (
&Tand&mut T) compile down to raw pointers - always point to the starting byte of
T - always know the width of type
Tin bytes - Use
std::mem::transmuteto cast a reference to a value as a raw pointer- highly unsafe
- raw pointers do NOT own their values
- no compiler checks
- multiple raw pointers to the same data are allowed
- Reasons to use raw pointers:
- it's unavoidable
- shared access to something is essential and runtime performance is paramount
- smart pointers
- wrap raw pointers
- with added semantics
- fat pointer - refers to memory layout
- thin pointers, such as raw ones, are a single
usizewide - fat pointers are two
usizewide or more
- thin pointers, such as raw ones, are a single
Box<T>stores a value on heapArc<T>is thread safe but adds runtime cost- Rust pointer types are extensible
core::ptr::Unique- basis for types such asStringBox<T>Vec<T>
core::ptr::Shared- basis forRc<T>Arc<T>
std::rc::Weak&std::arc::Weak- for deeply interlinked data structures
- for single threaed vs. multi threaded programs
- allow access to data within
Rc/Arcwithout incrementing its reference count - prevents never-ending cycles of pointers
alloc::raw_vec::RawVec- underlies
Vec<T>andVecDeq<T> - expandable, double-ended queue
- underlies
std::cell::UnsafeCell- basis of
Cell<T>andRefCell<T> - for interior mutability
- basis of
- stack vs. heap
- stack is fast;
- heap is slow;
- To place data on stack, the compiler must know the type's size at compile time
- in Rust - use types that implement
Sized
- in Rust - use types that implement
Stack grows downwards with decreasing values
|
|
|
|
|
\/
/\
|
|
|
|
|
Heap grows upwards with increasing values
- stack frame, a.k.a. allocation record, a.k.a. activation frame
- created as function calls are made
- contains a function's state during the call
- arguments
- pointer to the original call site
- local variables (except for the data allocated on heap)
- every stack frame is different in size
- A cursor within the CPU updates to reflect the current address of the current stack frame
- stack pointer
- stack pointer decreases in value as stack grows
- functions are called within functions
- stack pointer increases in value as function returns
- Stack contains two levels of objects:
- stack frames
- data
- stack allows access to multiple elements stored within, not just the top item
- stack can include elements of arbitrary size
Stringvs.&str- different representations in memory
&str- allocated on stackStringallocates memory on heap- functions only accepting
Stringwill not work with&str - use generics!
/* takes an argument `x` of type `T`, where `T` implements `AsRef<str>` `AsRef<str>` _behaves as_ a reference to `str` _even if it is not_ */ fn foo<T: AsRef<str>> (x: T) -> bool { x.as_ref().len() > 4 }
- use implicit conversion to read&write - since
&stris immutable
/* accept _only_ types that can be converted to `String` */ fn foo<T: Into<String>>(x: T) -> bool { // performs the conversion within the function // performs any logic to the newly created `String` x.into().len() > 4 }
- an area of program memory for types that do not have known size at compile time
- Some types grow/shrink overtime
StringVec<T>
- Some types unable to tell compiler how much memory to allocate even though these do not change size over time
- slices (
[T]) - have NO compile time length- internally they are a pointer to some part of an array
- trait object
- slices (
- Variables on the heap MUST BE accessed via a pointer
- the
Box<T>value (e.g.Box::new(42)) can only be accessed via a pointer
- the
Box::new(T)- allocatesTon the heap- something that has been boxed live on heap, with a pointer to it on stack
std::mem::drop()deletes the object before its scope ends- types implementing
Drophas adrop()method, BUT it is ILLEGAL to explicitly call it within user code - the boxed value has NOT been deleted from heap,
- but the memory allocator has marked that heap location as free for reuse
- types implementing
- Dynamic Memory Allocation - program asks more memory from OS
- request memory from OS via a system call -
alloc()orHeapAlloc() - use the allocated memory
- release the unneeded memory back to OS via
free()/HeapFree()
- request memory from OS via a system call -
- Allocator - the intermediary between program and OS
- Accessing data on stack is fast:
- a function's local variables (on stack) reside next to each other in RAM - contiguous layout
- cache friendly
- Accessing data on heap is slow:
- Variables on heap are unlikely to reside next to each other
- dereferencing the pointer involves table lookup and trip to main memory
- Data allocated on stack must stay the same size during the lifetime of the program
- Memory allocation speed is NOT well-correlated with allocation size
- although larger memory allocation do tend to take longer, it is NOT guaranteed
- To minimize heap allocations:
- Use arrays of uninitialized objects
- instead of creating objects from scratch as required, create a bulk lot of those with zeroed values
- when it's time to activate one of the objects, set its value to non-zero
- can be DANGEROUS
- Use an allocator tuned for the application's access memory profile
- Investigate
arena::Arenaandarena::TypedArena- able to create objects on the fly
- but
alloc()andfree()are only called when the arena is created/destroyed
- Use arrays of uninitialized objects
- Page - fixed-size block of words in real memory, typically 4KB in size on 64-bit archs
- Word - any type that is size of a pointer
- corresponds to width of the CPU's registers
usizeandisizein Rust
- Page fault
- error raised by CPU when a valid memory address is requested that is not currently in a physical RAM
- signals OS that at least one page must be swapped back into memory
- Swapping - migrating a page of memory stored temporarily on disk from main memory upon request
- Virtual memory - OS view of the physical memory available on the system
- Page table - data structure maintained by OS to manage translating virtual -> real memory
- Segment - block within virtual memory
- virtual memory is divided into blocks to minimize space required to translate virtual <-> real memory
- Segmentation fault - error raised by CPU when an illegal memory address is requested
- MMU - component of CPU that manages memory address translation
- maintains a cache of recently translated addresses (TLB)
Box::into_raw()- Accessing data in a program requires virtual addresses - the ONLY addresses accessible by the program
- translation performed by CPU
- instructions stored in OS
- **Memory Management Unit** (MMU) - within CPU to perform translation
- every virtual address is mapped to a physical address
- instructions stored in a pre-defined address in memory
- in the worst case, every attempt at accessing memory addresses incurs two memory lookups
- **Translation Lookaside Buffer** (TLB)
- cache of recently translated addresses, maintained by CPU
- has its own (fast) memory
- typically around 100 pages on x86 archs
- reaching its capacity can be costly
- virtual addresses are grouped into blocks, pages
- typically 4KB in size
- avoids memory fragmentation
- Having a virtual address space allows OS to overallocate
- Inactive memory pages can be swapped to disk in a byte-for-byte manner, until it's requested by the active program
- often used during high contention for memory
- Compression, among other size optimizations, can be performed
- Paging can speed up the loading of shared libraries
- Paging adds security between programs
- OS can add other attributes
- Keep hot working portions of the program within 4KB of size
- If 4KB is unreasonable, keep under 4KB * 100
- so that CPU can maintain its TLB in good order
- Avoid deeply nested data structures with pointer spaghetti
- Test the ordering of nested loops
kernel.dllon Windows