Reimplement the YCSB (Yahoo! Cloud Serving Benchmark) in C++.
This project is meant for experimenting with caching layers for the YCSB benchmark.
Currently supported databases:
- LevelDB
- MySQL
- MongoDB
- Sqlite3
The cache layers' implementations can be found at KIDSSCC/cachelibArchive, which is based on cachelib.
build and install LevelDB
git clone --recurse-submodules https://github.com/google/leveldb.git
cd leveldb
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && cmake --build .
cmake --install .
install MySQL server and mysql-connector-cpp
you can use apt-get to install
sudo apt-get install mysql-server libmysqlcppconn-dev
then configure MySQL server as you like
note in CMakeLists.txt, the libmysqlcppconn is hardcoded to /usr/include/cppconn, you may need to change it to your own path
follow the instructions here to install MongoDB, and instructions here to install the MongoDB C++ driver
install sqlite3 and libsqlite3-dev using apt (note: multithreaded version is required)
sudo apt install sqlite3 libsqlite3-dev
this repository includes a fake header of cache layer, so you can build the project without the KIDSSCC/cachelibArchive, to build the project with the cache layer, follow the instructions from the KIDSSCC/cachelibArchive
git clone https://github.com/SynodicMonth/YCSB-CPP.git
cd YCSB-CPP
mkdir -p build && cd build
cmake .. && cmake --build .
Edit `src/benchmarks/config.h to configure the backend and benchmark settings, then compile the project.
cd build
cmake .. && cmake --build .
Then run the benchmark using command below, the benchmark will prepare the database and run the workload
./YCSB-CPP --threads <num_threads>
To enable the cache layer, use
./YCSB-CPP --threads <num_threads> --cache
If you want to seperate the prepare and run phase, use
./YCSB-CPP --prepare
./YCSB-CPP --run --threads <num_threads> --cache
Note: if the benchmark inserts data in the runnning phase, it will automatically remove the data inserted when the program exits. But if its interrupted or crashed, the data will remain in the database, so you need to prepare the database again.
The benchmark settings can be configured in src/benchmarks/config.h
You can change the backend by defining the BACKEND
macro to the backend you want to use
// #define BACKEND MySQLBackend
// #define BACKEND MongoDBBackend
#define BACKEND LevelDBBackend
// #define BACKEND SQLiteBackend
These macros control how the benchmark will run
#define MAX_RECORDS 1000 // number of records in the database
#define MAX_FIELDS 10 // number of fields in each record
#define MAX_FIELD_SIZE 100 // size of each field (in chars)
#define MAX_QUERIES 100000 // number of queries to execute
#define QUERY_PROPORTION 0.98 // proportion of read queries, the rest are insert queries
There are 5 distributions for the queries, DISTRIBUTION_ZIPFIAN
, DISTRIBUTION_UNIFORM
, DISTRIBUTION_SEQUENTIAL
, DISTRIBUTION_HOTSPOT
, DISTRIBUTION_EXPONENTIAL
, you can change the distribution by defining the DISTRIBUTION
macro
#define DISTRIBUTION DISTRIBUTION_ZIPFIAN // distribution of the keys
some distributions require additional parameters, you can change the parameters accordingly
#if DISTRIBUTION == DISTRIBUTION_ZIPFIAN
#define ZIPFIAN_SKEW 0.99
#elif DISTRIBUTION == DISTRIBUTION_HOTSPOT
// p(x \in [0, HOTSPOT_PROPORTION * N)) = HOTSPOT_ALPHA
// p(x \in [HOTSPOT_PROPORTION * N, N)) = (1 - HOTSPOT_ALPHA)
#define HOTSPOT_PROPORTION 0.2
#define HOTSPOT_ALPHA 0.8
#elif DISTRIBUTION == DISTRIBUTION_EXPONENTIAL
#define EXPONENTIAL_LAMBDA 0.1
#endif
Note: We define hotspot as a piecewise function, where the first HOTSPOT_PROPORTION
of the keys have a probability of HOTSPOT_ALPHA
and the rest have a probability of 1 - HOTSPOT_ALPHA
. Which can be fomulated as:
$$
P(x) = \begin{cases}
\alpha & \text{if } x \in [0, Prop \times N) \
1 - \alpha & \text{if } x \in [Prop \times N, N)
\end{cases}
$$
, where