IMLAB

Task 1

  • Implement the TPC-C New Order transaction and storage for all relations as a standalone C++ program.
  • Implement the load procedures to populate the database with the data provided in the *.tbl files.
  • Translate the New Order transaction to C++ using the mapping from SQL to C data types.
    • integer: int32_t
    • char, varchar: length plus character array (do not use pointers or std::string)
    • timestamp: uint64_t (seconds since 01.01.1970)
    • numeric(x, y): int64_t
      (x is the total number of digits, y is number of digits after the decimal separator, for example numeric(10,2) x=12.3 is represented as 1230)
  • Implement the New Order transaction as C++ function.
  • Execute the New Order transaction 1 million times with random input.
  • Print the number of tuples in the tables before and after executing the transactions.

How many New Order transactions per second can your implementation execute?

Useful files:

Useful links:

Task 2

  • Use bison and flex to write a parser for SQL schema files.
    • You can assume, that all attributes are declared NOT NULL.
    • You must support primary keys.
    • You can ignore secondary indexes (for now).
  • Write a schema compiler, that generates C++ code from the schema representation.
    • You must implement the CRUD operations for every relation. (Create, Read, Update, Delete).
    • It must further be possible to lookup a tuple by its primary key (if any).
    • Use the schema compiler to generate code for the schema and add it to your repository.
  • Re-implement the New Order transaction as C++ function using your generated code.

The project skeleton for this task contains a grammar & scanner for a toy format called FOO, which looks as follows:

foo bar1 {
    a integer,
    b char{20}
};

foo bar2 {
    c integer
};

You can use the executable schemac to parse a FOO file with the following command:

./schemac --in ~/Desktop/example.foo --out_cc /tmp/example.cc --out_h /tmp/example.h

First learn how bison & flex work together and how they are used in the following files:

Then use the parsed schema to generate C++ code with a schema compiler:

Task 3

  • Implement the Delivery transaction.
  • Execute 1 million transactions composed of a mix of New Order (90%) and Delivery (10%) transactions.
  • Implement this analytical query using STL datastructures for the join implementation.
    • Do not use any existing existing index structures.
    • Follow this interface.
    • The result of the query our dataset should be 1367872001.25.
  • Execute OLTP and OLAP queries concurrently.
    • Execute 1 million transactions of the NewOrder/Delivery mix.
    • Run the analytical concurrently using the fork system call (in imlabdb.cc).
    • Whenever an analytical query is finished, create a new snapshot using fork, such that exactly one snapshot and query is active at any given time.
    • This example for using fork may be helpful.
  • Implement your own lazy hash table using this interface.
  • Measure the runtimes of the analytical query with the STL datastructure and your hash table.

Task 4

  • Implement query compilation of relational algebra trees to C++ code. Use the produce/consume (push) model.
  • You need to support the following operators: table scan, selection, print and inner join.
  • Test your code to make sure, that the IUs are correctly propagated through the operators.
  • Manually build an algebra tree for this SQL query.
  • Write a query compiler that generates C++ code for the static algebra tree.
  • Add the generated C++ code to your repository and execute it in imlabdb.cc using the print operator.

Task 5

  • Implement a simple parser for the following subset of SQL:

    • The select clause contains one or more attribute names.
    • The from clause contains one or more relation names.
    • The where clause is optional and can contain one or more selections (connected by "and").
    • You only need to support selections of the form "attribute = attribute" and "attribute = constant".
    • You can assume that each relation is referenced only once, and that all attribute names are unique.
  • Implement a semantic analysis, which takes the parsed SQL statement and the schema as input and computes an algebra tree that can be used as input to your code generator.

    • Report errors when non-existing attributes or tables are referenced.
    • You should report an error if a table has no join condition (i.e., a cross product would be necessary).
    • Build left-deep join trees based on the order of the relations in the from clause.
  • Implement a REPL ("read-eval-print-loop") for your database system:

    • Read one line from standard input.
    • Analyze it (report any errors and abort if necessary).
    • Generate C++ code and store it in a temporary file.
    • Call a C++ compiler and generate a temporary shared library (.so file).
    • Using dlopen, dlsym, and dlclose you can now load the shared library and execute the query in it (have a look at this example), you man need to compile with -fPIC and -rdynamic.
    • Measure and report the compilation and the execution times separately.
  • Test your database with these queries.

How many transactions per second can your implementation execute?

Make sure your builds are not failing!
Left Sidebar > CI /CD > Pipelines