This project studies the problem of schema design for SQL-over-NoSQL architecture with the block-as-a-value (BaaV) model. The main objective is to minimize the memory footprint of SQL layers as they typical consume large memories even over a rather moderate size database. The main pillar of our approach is the BaaV model for relations in NoSQL (key-value) stores, which enables the SQL layers to retrieve values instead of tuples (column families). We propose two types of BaaV implementation and study the schema design of BaaV stores to optimize memory footprint of SQL evaluation with the SQL-over-NoSQL architecture.
- minimizing memory footprint of SQL layers ≠ minimizing evaluation time
- Does BaaV help with minimizing memory footprint of SQL layers? Can this be done via secondary index on NoSQL (wide-column-family stores)?
- How to design BaaV schema to maximize its benefits? Modeling design space? Optimal trade-off?
- [ ] Test the memory footprint of common SQL layers (e.g., Spark) for answering SQL queries over relations in NoSQL (wide-column-family stores). This is to justify and motivate the study of minimizing memory footprint of SQL evaluation.
- [ ] BaaV schema (AC) implementation:
- [ ] using compound values, with extra overhead of (un)marshalling
- how much? is it worth?
- [ ] using wide-column-family with sorted (multi-fragmented) keys
- sorted keys to represent entire BaaV schema (XY) or just encode X?
- [ ] using compound values, with extra overhead of (un)marshalling
- [ ] BaaV schema design for given workloads to minimize total memory footprint
- what criteria to consider?
- what’s the relationship between memory footprint and evaluation time for SQL-over-NoSQL?