This repo contains a few scripts to collect DNS lookup-times, aggregate the times to average times, infer some statistics about the times collected and create DNS lookup workload traces.
The dns-timing.py scripts use bind for the DNS lookups cache, specifically it uses the dig command. You must have bind installed on your system to run this script.
Also the following python external libraries are being used:
- pause
- matplotlib
- numpy
you can use the requirments file to install the required libraries.
Just run the following:
pip install -r requirments.txt
In case this is the first time you use this repo, the general workflow is:
- Choose your urls to time, we used the 1M_webrank.
- Run dns-timing.py on how many vms you wish to use, we used 6, running on the chosen urls from (1). you may also choose the number of iterations to time all the urls, including delays between iterations.
- After the times were collected you can use find-invalid.py, which finds suspected urls for being invalid, and you can make a new filtered urls list file.
- Now you can use trace-maker.py using either the filtered or the un-filtered urls + the times collected.
- You can also gather some statistics on the times collected using timing-stats.py
This was our general workflow creating our workloads traces.
All scripts has --help or -h flag to get usage of the script.
Examples of getting usage and flags info below,
> python dns-timing.py -h
> usage: dns-timing.py [-h] [-input INPUT] [-dd D] [-hd H] max_vm vm iters
Collects dns lookup times for cache miss and hit from list of urls.
positional arguments:
max_vm a positive integer for the max number of vms.
vm an integer for the vm number.
iters number of iterations to run the script over the segment of
urls.
optional arguments:
-h, --help show this help message and exit
-input INPUT path to input file containing urls to time (default:
"1M_webrank").
-dd D an integer for the iteration delay in days (default: 1).
-hd H an integer for the iteration delay in hours (default: 1).
> python timing-stats.py -h
> usage: timing-stats.py [-h] [-basedir BASEDIR] [-vm_num VM_NUM]
[-o OUTPUT_PATH] [-i INPUT_PATH] [--logscale]
Displays statistics about times collected
optional arguments:
-h, --help show this help message and exit
-basedir BASEDIR path to base directory containing collected times
(default: "data").
-vm_num VM_NUM a positive integer for the used number of vms.
-i INPUT_PATH path for aggregated times input file, if not given
aggregation is done from files in basedir.
--logscale flag to make histogram plot y-scale be in log-scale.
> python find-invalid.py -h
> usage: usage: find-invalid.py [-h] [-filename FILENAME] [-basedir BASEDIR]
Displays suspect urls from file that might be invalid and creates a new urls
list file on demand.
optional arguments:
-h, --help show this help message and exit
-filename FILENAME path to input file containing urls to time (default:
"1M_webrank").
-basedir BASEDIR path to base directory containing collected times
(default: ".").
> python trace-maker.py -h
> usage: trace-maker.py [-h] [-input INPUT] [-length L] [-num N]
[-vm_num VM_NUM] [-basedir BASEDIR] [-r_hit R_HIT]
[-r_miss R_MISS] [--replace]
Creates new random traces from input file
optional arguments:
-h, --help show this help message and exit
-input INPUT path to input file containing urls to time (default:
"filtered_1M_webrank").
-length L an integer for the number of requests in trace (default:
5e7).
-num N an integer for the number of traces to create (default:
1).
-vm_num VM_NUM a positive integer for the used number of vms.
-basedir BASEDIR path to base directory containing collected times
(default: "data").
-r_hit R_HIT value for hit time in case of replace=True and negative
time found (default: 1000).
-r_miss R_MISS value for miss time in case of replace=True and negative
time found (default: 10000).
--replace if this flag is used then negative times will be replaced.
All of the available data files for this repo can be found here.
Times collected from Ben-Gurion University
Average times files