- Implement the TPC-C New Order transaction and storage for all relations as a standalone C++ program.
- Implement the load procedures to populate the database with the data provided in the *.tbl files.
- Translate the New Order transaction to C++ using the mapping from SQL to C data types.
- integer: int32_t
- char, varchar: length plus character array (do not use pointers or std::string)
- timestamp: uint64_t (seconds since 01.01.1970)
- numeric(x, y): int64_t
(x is the total number of digits, y is number of digits after the decimal separator, for example numeric(10,2) x=12.3 is represented as 1230)
- Implement the New Order transaction as C++ function.
- Execute the New Order transaction 1 million times with random input.
- Print the number of tuples in the tables before and after executing the transactions.
How many New Order transactions per second can your implementation execute?
Useful files:
- CMake configuration (CMakeLists.txt)
- Type header (include/imlab/infra/types.h)
- Key hashing (include/imlab/infra/key.h)
- Database header (include/imlab/database.h)
- Database defs (src/database.cc)
- Database tests (test/database_test.cc)
- Executable (tools/imlabdb.cc)
Useful links:
- Use bison and flex to write a parser for SQL schema files.
- You can assume, that all attributes are declared NOT NULL.
- You must support primary keys.
- You can ignore secondary indexes (for now).
- Write a schema compiler, that generates C++ code from the schema representation.
- You must implement the CRUD operations for every relation. (Create, Read, Update, Delete).
- It must further be possible to lookup a tuple by its primary key (if any).
- Use the schema compiler to generate code for the schema and add it to your repository.
- Re-implement the New Order transaction as C++ function using your generated code.
The project skeleton for this task contains a grammar & scanner for a toy format called FOO, which looks as follows:
foo bar1 {
a integer,
b char{20}
};
foo bar2 {
c integer
};
You can use the executable schemac
to parse a FOO file with the following command:
./schemac --in ~/Desktop/example.foo --out_cc /tmp/example.cc --out_h /tmp/example.h
First learn how bison & flex work together and how they are used in the following files:
Then use the parsed schema to generate C++ code with a schema compiler:
- Implement the Delivery transaction.
- Execute 1 million transactions composed of a mix of New Order (90%) and Delivery (10%) transactions.
- Implement this analytical query using STL datastructures for the join implementation.
- Do not use any existing existing index structures.
- Follow this interface.
- The result of the query our dataset should be 1367872001.25.
- Execute OLTP and OLAP queries concurrently.
- Execute 1 million transactions of the NewOrder/Delivery mix.
- Run the analytical concurrently using the fork system call (in imlabdb.cc).
- Whenever an analytical query is finished, create a new snapshot using fork, such that exactly one snapshot and query is active at any given time.
- This example for using fork may be helpful.
- Implement your own lazy hash table using this interface.
- Measure the runtimes of the analytical query with the STL datastructure and your hash table.
- Implement query compilation of relational algebra trees to C++ code. Use the produce/consume (push) model.
- You need to support the following operators: table scan, selection, print and inner join.
- Test your code to make sure, that the IUs are correctly propagated through the operators.
- Manually build an algebra tree for this SQL query.
- Write a query compiler that generates C++ code for the static algebra tree.
- Add the generated C++ code to your repository and execute it in imlabdb.cc using the print operator.
-
Implement a simple parser for the following subset of SQL:
- The select clause contains one or more attribute names.
- The from clause contains one or more relation names.
- The where clause is optional and can contain one or more selections (connected by "and").
- You only need to support selections of the form "attribute = attribute" and "attribute = constant".
- You can assume that each relation is referenced only once, and that all attribute names are unique.
-
Implement a semantic analysis, which takes the parsed SQL statement and the schema as input and computes an algebra tree that can be used as input to your code generator.
- Report errors when non-existing attributes or tables are referenced.
- You should report an error if a table has no join condition (i.e., a cross product would be necessary).
- Build left-deep join trees based on the order of the relations in the from clause.
-
Implement a REPL ("read-eval-print-loop") for your database system:
- Read one line from standard input.
- Analyze it (report any errors and abort if necessary).
- Generate C++ code and store it in a temporary file.
- Call a C++ compiler and generate a temporary shared library (.so file).
- Using
dlopen
,dlsym
, anddlclose
you can now load the shared library and execute the query in it (have a look at this example), you man need to compile with-fPIC
and-rdynamic
. - Measure and report the compilation and the execution times separately.
-
Test your database with these queries.
How many transactions per second can your implementation execute?
Make sure your builds are not failing!
Left Sidebar > CI /CD > Pipelines