This is a profiler framework used in paper "ADOC: Automatically Harmonizing Dataflow Between Components in Log-Structured Key-Value Stores for Improved Performance"
You can use this framework to track following real-time metrics:
Metrics | Measure Tool | content | Result File |
---|---|---|---|
IO information | iostat | CPU utilization (IO process only) and bandwidth for each second | text, named with IOSTAT.txt |
process information | pidstat | cpu utilization (sys,usr,total), disk info (bytes read/written in), | stat_result.csv |
real-time throughput | db_bench | time elapsed and throughput | report.csv |
output of db_bench | N/A | workload information, performance summary, level states, etc. | stdout.txt |
*perf output | perf | execution trace | perf.out |
* The usage of perf is listed in the example directory, but you need to config it by your self, it won't be embedded automatically, for both performance and size consideration.
There are also some other thing you can use, for example:
- We embedded the
cgroup
tool in the system, you can- limit the CPU clock number to control the vCPU used by db_bench (set in default.ini)
- limit the bandwidth by
bytes_wrote_in
, check that in the example "bandwidth_influence" - do further modification to make full use of
cgroup
tool
- You can use the
db_bench_dynamic_runner
to simulate the scenarios:- Your db_bench is running with another software with higher throughput (The throughput is generated by Alibaba's workload trace, but only the first one hour of machine No. 48)
- If you are really interested the impact of each parameter, try the parameter_influence example, we will upload the ANOVA test script later, so that you can use ONE-WAY ANOVA to analyze the impact of different parameters, and pick the most important ones. This function is inspired from the paper Rafiki
Warning!!!
The result files can be very large, use the command sudo gzip **/LOG*
,sudo gzip **/iostat*
to compress the oversized files.
- Download RocksDB, and compile the db_bench
- Modify the default.ini, and set the db_bench path
- You can always reload the path with in the running script
- This framework was designed for evaluating the impact of thread number and batch size (common size of Memtable and SSTable), but you can always change the configure in the config.json
- You will need several python packages, and following system tools:
- iostat
- pidstat
- top
- perf
- cgroup
- Please download the flame graph tool in this link if you want to plot the flamegraph
- If you are interested, you can visit the plot script in this link
After you have installed all the packages, create a directory, create a DB_launcher class to run your experiments. Refer the following examples to see further details.
dir name | usage |
---|---|
bandwidth_influence | use cgroup to limit the available bandwidth |
parameter_influence | traverse through all options, and use ANOVA method to evaluate the impact of each parameter |
rate-limited-fillrandom | run the fillrandom workload with a rate-limiter in db_bench |
fillrandom | the basic usage, run fillrandom and monitor the resource usage |
white_noise_fillrandom | run fillrandom with varying bandwidth, the bandwidth follows a sine function |
on_cpu_analysis | run fillrandom and save the perf results |