Scala DBEst
A Scala implementation of DBEst Approximate Query Processing
Table of Contents
About The Project
The present design was realized for a Semester Project at EPFL in collaboration with the Laboratory for Data Intensive Applications & Systems of Prof. Anastasia Ailamaki and under the supervision of the PhD student Viktor Sanca.
This project aimed at building a Scala implementation of the DBEst Approximation Query Processor (AQP) with the well known Spark library. Traditional AQPs rely on data sampling to approximate a query anwer. DBEst is a novative AQP that approximate the answer based on Machine Learning models. This brings many advantages regarding the query response time, database portability and data transfer. Indeed, under certain error tolerance (which is manageable), the data is not required anymore to information from a certain database.
We decided to write a Spark based implementation of DBEst as a first step to extend the original DBEst implementation. One could analyse the perspectives of model-based querying in situation where there is constraints regarding the query responsiveness, the network data flow or the data storage.
Please find here the report related to my work.
Built With
As mention above, the implementation rely on Apache Spark Library (2.4.6) and Scala Lang (2.11.12). For the other libraries, please check the buidl.sbt file for further details.
Sbt(1.0.0) is also required to build the project.
Getting Started
Here are the steps to start building the project and run the analysis experiments.
-
Please first download the code or import it through
git clone https://github.com/raphaelreis/DBest.git
command. -
Then you have to write on the
conf/configuration.conf
file. Mostly you have to setup your base directory path (the path to the directory of the project). -
You can setup the directory paths for the results running the script
scripts/setup.sh
from your working directory. -
Build the project in the working directory with the command
sbt package
-
To run all the experiments run the command
scripts/runexp_sample.sh 1 2 3
. -
There is also a script to run the model training beforehand
scripts/train_models.sh
.
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Raphaël Reis Nunes - @LinkedIn - email: raphael.reisnunes at epfl dot ch