/DBest

Approximate Query Processing Engine written in Scala

Primary LanguageScalaMIT LicenseMIT


Scala DBEst


A Scala implementation of DBEst Approximate Query Processing

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. License
  5. Contact
  6. Acknowledgements

About The Project

The present design was realized for a Semester Project at EPFL in collaboration with the Laboratory for Data Intensive Applications & Systems of Prof. Anastasia Ailamaki and under the supervision of the PhD student Viktor Sanca.

This project aimed at building a Scala implementation of the DBEst Approximation Query Processor (AQP) with the well known Spark library. Traditional AQPs rely on data sampling to approximate a query anwer. DBEst is a novative AQP that approximate the answer based on Machine Learning models. This brings many advantages regarding the query response time, database portability and data transfer. Indeed, under certain error tolerance (which is manageable), the data is not required anymore to information from a certain database.

We decided to write a Spark based implementation of DBEst as a first step to extend the original DBEst implementation. One could analyse the perspectives of model-based querying in situation where there is constraints regarding the query responsiveness, the network data flow or the data storage.

Please find here the report related to my work.

Built With

As mention above, the implementation rely on Apache Spark Library (2.4.6) and Scala Lang (2.11.12). For the other libraries, please check the buidl.sbt file for further details.

Sbt(1.0.0) is also required to build the project.

Getting Started

Here are the steps to start building the project and run the analysis experiments.

  • Please first download the code or import it through git clone https://github.com/raphaelreis/DBest.git command.

  • Then you have to write on the conf/configuration.conf file. Mostly you have to setup your base directory path (the path to the directory of the project).

  • You can setup the directory paths for the results running the script scripts/setup.sh from your working directory.

  • Build the project in the working directory with the command sbt package

  • To run all the experiments run the command scripts/runexp_sample.sh 1 2 3.

  • There is also a script to run the model training beforehand scripts/train_models.sh.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Raphaël Reis Nunes - @LinkedIn - email: raphael.reisnunes at epfl dot ch

Acknowledgements