IMLAB

Task 1

Implement the TPC-C New Order transaction and storage for all relations as a standalone C++ program.
Implement the load procedures to populate the database with the data provided in the *.tbl files.
Translate the New Order transaction to C++ using the mapping from SQL to C data types.
- integer: int32_t
- char, varchar: length plus character array (do not use pointers or std::string)
- timestamp: uint64_t (seconds since 01.01.1970)
- numeric(x, y): int64_t
  (x is the total number of digits, y is number of digits after the decimal separator, for example numeric(10,2) x=12.3 is represented as 1230)
Implement the New Order transaction as C++ function.
Execute the New Order transaction 1 million times with random input.
Print the number of tuples in the tables before and after executing the transactions.

How many New Order transactions per second can your implementation execute?

Useful files:

Useful links:

Use bison and flex to write a parser for SQL schema files.
- You can assume, that all attributes are declared NOT NULL.
- You must support primary keys.
- You can ignore secondary indexes (for now).
Write a schema compiler, that generates C++ code from the schema representation.
- You must implement the CRUD operations for every relation. (Create, Read, Update, Delete).
- It must further be possible to lookup a tuple by its primary key (if any).
- Use the schema compiler to generate code for the schema and add it to your repository.
Re-implement the New Order transaction as C++ function using your generated code.

The project skeleton for this task contains a grammar & scanner for a toy format called FOO, which looks as follows:

foo bar1 {
    a integer,
    b char{20}
};

foo bar2 {
    c integer
};

You can use the executable schemac to parse a FOO file with the following command:

./schemac --in ~/Desktop/example.foo --out_cc /tmp/example.cc --out_h /tmp/example.h

First learn how bison & flex work together and how they are used in the following files:

Then use the parsed schema to generate C++ code with a schema compiler:

Implement the Delivery transaction.
Execute 1 million transactions composed of a mix of New Order (90%) and Delivery (10%) transactions.
Implement this analytical query using STL datastructures for the join implementation.
- Do not use any existing existing index structures.
- Follow this interface.
- The result of the query our dataset should be 1367872001.25.
Execute OLTP and OLAP queries concurrently.
- Execute 1 million transactions of the NewOrder/Delivery mix.
- Run the analytical concurrently using the fork system call (in imlabdb.cc).
- Whenever an analytical query is finished, create a new snapshot using fork, such that exactly one snapshot and query is active at any given time.
- This example for using fork may be helpful.
Implement your own lazy hash table using this interface.
Measure the runtimes of the analytical query with the STL datastructure and your hash table.

Implement query compilation of relational algebra trees to C++ code. Use the produce/consume (push) model.
You need to support the following operators: table scan, selection, print and inner join.
Test your code to make sure, that the IUs are correctly propagated through the operators.
Manually build an algebra tree for this SQL query.
Write a query compiler that generates C++ code for the static algebra tree.
Add the generated C++ code to your repository and execute it in imlabdb.cc using the print operator.

Implement a simple parser for the following subset of SQL:
- The select clause contains one or more attribute names.
- The from clause contains one or more relation names.
- The where clause is optional and can contain one or more selections (connected by "and").
- You only need to support selections of the form "attribute = attribute" and "attribute = constant".
- You can assume that each relation is referenced only once, and that all attribute names are unique.
Implement a semantic analysis, which takes the parsed SQL statement and the schema as input and computes an algebra tree that can be used as input to your code generator.
- Report errors when non-existing attributes or tables are referenced.
- You should report an error if a table has no join condition (i.e., a cross product would be necessary).
- Build left-deep join trees based on the order of the relations in the from clause.
Implement a REPL ("read-eval-print-loop") for your database system:
- Read one line from standard input.
- Analyze it (report any errors and abort if necessary).
- Generate C++ code and store it in a temporary file.
- Call a C++ compiler and generate a temporary shared library (.so file).
- Using dlopen, dlsym, and dlclose you can now load the shared library and execute the query in it (have a look at this example), you man need to compile with -fPIC and -rdynamic.
- Measure and report the compilation and the execution times separately.
Test your database with these queries.

How many transactions per second can your implementation execute?

Make sure your builds are not failing!
Left Sidebar > CI /CD > Pipelines