hsql

An SQL engine on top of Hadoop

There are 4 files

lexer.py : this does the job of lexing
parser.py : this does the parsing of the SQL queries
main.py : this is the which runs the interpreter (like python !) and you can run SQL Queries
implemenation.py : this file contains the implemenations of SQL queries and some functions (Must be completed)

The SQL Queries Currently Supported :

select
use
drop
load
create database
schema (not there in standard SQL. Added to view schema of databases and tables)
current database (again not there in standard SQL. Added to know the currently selected database)
exit() or quit() (to quit the interpreter)

What are requirements ?

hadoop python3 ply

How to install ply ?

pip3 install ply

How to run the interpreter ?

python3 main.py

What's Currently Working ?

use
create database
load (partially)
drop
schema
current database

What's must be done ?

Make it work on hadoop (create and delete files/folders in hadoop. currently made to work on file system and not hadoop. May have to change remove() in implementation.py. can do in end I guess)
Implement load completely (currently only writing meta data (schema info) into database_name.schema. Must split the csv file into columns and store each column as separate file. All addresses are passed into load function. Must compelete it )
Implement select
Implement aggregate functions MAX, COUNT, SUM

Note : May have to write mapper/reducer in separate files and call them via system call in the wrapper functions select, load, MAX,COUNT and SUM via hadoop streaming API

The Directory Organization

DATABASE_ROOT/
    database_name.schema
    dblist.db
    database_name/
        table_name/
                column_name

Note

dblist.db is file which contains the list of all the databases (only 1)
There is one schema file per database
There is one directory for each database
column files contain the data in a column (same column cannot repeat in the table can be found in other table in same db)

Have commented as many important lines as possible. If you have any doubts, call me.

virajbukitagar123/hsql