- lexer.py : this does the job of lexing
- parser.py : this does the parsing of the SQL queries
- main.py : this is the which runs the interpreter (like python !) and you can run SQL Queries
- implemenation.py : this file contains the implemenations of SQL queries and some functions (Must be completed)
- select
- use
- drop
- load
- create database
- schema (not there in standard SQL. Added to view schema of databases and tables)
- current database (again not there in standard SQL. Added to know the currently selected database)
- exit() or quit() (to quit the interpreter)
hadoop python3 ply
pip3 install ply
python3 main.py
- use
- create database
- load (partially)
- drop
- schema
- current database
- Make it work on hadoop (create and delete files/folders in hadoop. currently made to work on file system and not hadoop. May have to change remove() in implementation.py. can do in end I guess)
- Implement load completely (currently only writing meta data (schema info) into database_name.schema. Must split the csv file into columns and store each column as separate file. All addresses are passed into load function. Must compelete it )
- Implement select
- Implement aggregate functions MAX, COUNT, SUM
Note : May have to write mapper/reducer in separate files and call them via system call in the wrapper functions select, load, MAX,COUNT and SUM via hadoop streaming API
DATABASE_ROOT/
database_name.schema
dblist.db
database_name/
table_name/
column_name
Note
- dblist.db is file which contains the list of all the databases (only 1)
- There is one schema file per database
- There is one directory for each database
- column files contain the data in a column (same column cannot repeat in the table can be found in other table in same db)
Have commented as many important lines as possible. If you have any doubts, call me.