webcachesim2

A simulator for CDN caching and web caching policies.

Simulate a variety of existing caching policies by replaying request traces, and use this framework as a basis to experiment with new ones. A 14-day long Wikipedia trace is released alongside the simulator.

The webcachesim2 simulator was built to evaluate the Learning relaxed Belady algorithm (LRB), a new machine-learning-based caching algorithm. The simulator build on top of webcachesim, see References for more information.

Currently supported caching algorithms:

Learning Relaxed Belady (LRB)
LR (linear-regression based ML caching)
Belady (heap-based)
Belady (a sample-based approximate version)
Relaxed Belady
Inf (infinite-size cache)
LRU
B-LRU (Bloom Filter LRU)
ThLRU (LRU with threshold admission control)
LRUK
LFUDA
S4LRU
ThS4LRU (S4LRU with threshold admission control)
FIFO
Hyperbolic
GDSF
GDWheel
Adaptive-TinyLFU (via Java library integration)
LeCaR
UCB
LHD
AdaptSize
LFO
Random (random eviction)

Configuration parameters of these algorithms can be found in the config file config/algorithm_params.yaml

The prototype implementation on top of Apache Traffic Server is available here.

Trace

The Wikipedia trace download link. To uncomress:

tar -xzvf wiki2018.tr.tar.gz

Trace Format

Request traces are expected to be in a space-separated format with 3 columns and additionally columns for extra features.

time should be a long long int, but can be arbitrary (for future TTL feature, not currently in use)
id should be a long long int, used to uniquely identify objects
size should be uint32, this is object's size in bytes
extra features are optional uint16 features. LRB currently interprets them as categorical features (e.g., object type).

time	id	size
1	1	120
2	2	64
3	1	120
4	3	14
4	1	120

Simulator will run a sanity check on the trace when starting up.

Installation

For ease of use, we also provide a docker image which contains the simulator. Our documentation assumes that you use this image. To run it:

 docker run -it -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim:v0.1 ${traceFile} ${cacheType} ${cacheSize} [--param=value]

Alternatively, you may follow the instruction to manually install the simulator.

Using an existing policy

The basic interface is

./webcachesim_cli traceFile cacheType cacheSize [--param=value]

where

traceFile: a request trace (see trace format)
cacheType: one of the caching policies
cacheSize: the cache capacity in bytes
param, value: optional cache parameter and value, can be used to tune cache policies

Global parameters

parameter	type	description
bloom_filter	0/1	use bloom filter as admission control in front of cache algorithm
dburi, dbcollection	string	upload simulation results to mongodb
is_metadata_in_cache_size	0/1	deducted metadata overhead from cache size
n_early_stop	int	stop simulation after n requests, <0 means no early stop

Examples

Running single simulation

Running LRU on Wiki trace 1TB

docker run -it -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim:v0.1 wiki2018.tr LRU 1099511627776

# running sanity check on trace: /trace/wiki2018.tr
# ...
# pass sanity check
# simulating
# segment id: 0
# ...
# results will be print in json string. Byte miss and byte req are aggregated in segment_byte_req, segment_byte_miss.
# The default segment size is 1 million request. This allows to calculate final byte miss ratio with your warmup length.
# LRB evaluation warmup length is in the NSDI paper.
# Alternatively, check no_warmup_byte_miss_ratio for byte miss ratio without considering warmup.

Running B-LRU on Wiki trace 1TB

docker run -it -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim:v0.1 wiki2018.tr LRU 1099511627776 --bloom_filter=1

Running LRB on Wiki trace 1TB

docker run -it -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim:v0.1 wiki2018.tr LRB 1099511627776 --memory_window=671088640

LRB memory window for Wikipedia trace different cache sizes in the paper (based on first 20% validation prefix):

cache size (GB)	memory window
64	58720256
128	100663296
256	167772160
512	335544320
1024	671088640

Automatically tuning LRB memory window on a new trace

LRB_WINDOW_TUNING.md describes how to tune LRB memory window on a new trace.

Contributors are welcome

Want to contribute? Great! We follow the Github contribution work flow. This means that submissions should fork and use a Github pull requests to get merged into this code base.

There are a couple ways to help out.

Documentation and use cases

Tell us how you use webcachesim or how you'd want to use webcachesim and what you're missing to implement your use case. Feel free to create an issue for this purpose.

Bug Reports

If you come across a bug in webcachesim, please file a bug report by creating a new issue. This is an early-stage project, which depends on your input!

Write test cases

This project has not be thoroughly tested, any test cases are likely to get a speedy merge.

Contribute a new caching policy

If you want to add a new caching policy, please augment your code with a reference, a test case, and an example. Use pull requests as usual.

References

We ask academic works, which built on this code, to reference the LRB/AdaptSize papers:

Learning Relaxed Belady for Content Distribution Network Caching
Zhenyu Song, Daniel S. Berger, Kai Li, Wyatt Lloyd
USENIX NSDI 2020.

AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN
Daniel S. Berger, Ramesh K. Sitaraman, Mor Harchol-Balter
USENIX NSDI 2017.

License

We include 3rd party library source code in lib folder for better code isolation, and they may have different licenses:

libbf: BSD-3-Clause
LightGBM: MIT
Mongodb c driver: Apache-2.0
Mongodb cxx driver: Apache-2.0
sparsepp: BSD-3-Clause
tinylfu: Apache-2.0

suhjohn/lrb