gcc -O0 -Wall -g -o segment_map -pie -fpic segment_map.c -lpthread setcap cap_sys_resource=ep mov_code_segment mmap.c is a microbenchmark written to determine mmap, mprotect latencies 1. Process pinned to a random core 2. Anonymous region of specified size (4KB, 400KB, 4MB, 400MB, 4GB, 40GB) mmap-ed 3. Every page in region untouched/read/written 4. Mprotect called to update region permissions to none Things to watch out for- Transparent Huge Pages - enable/disable using /sys/kernel/mm/transparent_hugepage/enabled Kernel same-page merging? Clear Page cache- sync; echo 1 | sudo tee /proc/sys/vm/drop_caches Turn off hardware prefetechers using msr tool https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors 0x1a4 sudo rdmsr 0x1a4 Value of 0 means all h/w prefetchers enabled Setting the bits to 1 will disable prefetch sudo wrmsr 0x1a4 0xf To verify use perf- (from man perf-list) If the Intel docs for a QM720 Core i7 describe an event as: Event Umask Event Mask Num. Value Mnemonic Description Comment A8H 01H LSD.UOPS Counts the number of micro-ops Use cmask=1 and delivered by loop stream detector invert to count cycles raw encoding of 0x1A8 can be used perf stat -e r1a8 -a sleep 1 perf record -e r1a8 ... Frequency Scaling - ondemand frequency scaling means variable performance so use performance governor https://wiki.archlinux.org/index.php/CPU_frequency_scaling cpupower frequency-set -g governor If using Ftrace, follow the following steps echo 0 | sudo tee /sys/kernel/debug/tracing/tracing_on echo function_graph | sudo tee /sys/kernel/debug/tracing/current_tracer use filefrag to find how many extents in a file filefrag <file>
swapnilh/os_mm_studies
Collection of micro-benchmarks to evaluate the overheads of memory management in the kernel
Jupyter Notebook