/scrack

Stochastic Database Cracking

Primary LanguageC++

Stochastic Database Cracking

This is the source codes for the experiments in the Stochastic Database Cracking paper.

To run a particular algorithm on a particular dataset, execute:

./run.sh [data] [algo] [nqueries] [workload] [selectivity] [update] [timelimit]

[data] is one of the following:

  • 100000000.data
  • skyserver.data (it will be downloaded on demand around 2.2GB)

[algo] is one of the following:

  • crack
  • sort
  • scan
  • ddc
  • ddr
  • dd1c
  • dd1r
  • mdd1r
  • mdd1rp1
  • mdd1rp5
  • mdd1rp10
  • mdd1rp50
  • naive_r1th
  • naive_r2th
  • naive_r4th
  • naive_r8th
  • naive_r1x
  • naive_r2x
  • aicc
  • aicc1r
  • aics
  • aics1r
  • aiss

[nqueries] is an integer denoting the number of queries to be executed.

[workload] is one of the following:

  • Random
  • Sequential
  • SeqNoOver
  • SeqRevOver
  • SeqAlternate
  • SeqRand
  • ZoomIn
  • SeqZoomIn
  • ZoomOut
  • SeqZoomOut
  • Skew
  • ConsRandom
  • SkyServer (downloaded on demand)

[selectivity] is a floating point, e.g.:

  • 0.5 (means 50% selectivity)
  • 1e-2 (means 1% selectivity)
  • 1e-7 (means 0.00001% selectivity)

[update] is one of the following:

  • NOUP (means read only queries)
  • LFHV (means low frequency high volume updates)
  • HFLV (means high frequency low volume updates)
  • ROLL (means queue-like update workload)
  • TRASH (means insert 1M tuples at 10, 10^5 th query)
  • DELETE (means delete 1000 tuples every 1000 queries)
  • APPEND (means gradually insert 10M queries every 1000 queries)

[timelimit] is an integer denoting the maximum runtime in seconds before it is prematurely terminated (if exceeded).

Example runs:

./run.sh 100000000.data crack 100000 Random 1e-2 NOUP 30
./run.sh skyserver.data dd1r 200000 SkyServer 1e-7 NOUP 60
./run.sh skyserver.data dd1r 200000 SkyServer 1e-7 HFLV 60

Download SkyServer dataset and queries

The SkyServer dataset consists of a sequence of 585634221 integers which represents the degree of ascension in the Photoobjall table. Originally the degree is a floating point between 0 to 360, but in this dataset, it has been multiplied by 1 million and converted to integers. Run the following command to get the dataset (it may take very long to download).

./run.sh get-skyserver-dataset

Similarly, the SkyServer queries consist of a sequence of 158325 point queries on the ascension column (similarly formatted by multiplying it by 1 million and converted to integers).

./run.sh get-skyserver-queries