[docdb] Pack columns in DocDB storage format for better performance
rkarthik007 opened this issue · 2 comments
Jira Link: DB-2216
Packing columns into a single RocksDB entry per row instead of one per column (as we do currently) improves YSQL performance. Below is a table of benchmark results using a JSONB
column as a proxy for a packed representation on disk.
Setup
In this setup, there are two tables A and B (each with 128 columns). For each row in A, there are 4 rows in B. Consider the following two different way of creating A and B:
- A1 and B1 both have 128 column each.
- A2 and B2 have around 15 of the original columns each, and one additional column of type
JSONB
containing the key-value attributes of the remaining columns. This last column can be used to simulate a packed column representation.
Inserts
The following was observed when performing concurrent inserts from multiple clients.
Table Name | Num Indexes | Batch Size | Num Clients | Throughput |
---|---|---|---|---|
A1 | 0 | 0 | 32 | 1.5K |
A2 | 0 | 0 | 32 | 2.8K |
A1 | 2 | 20 | 8 | 1.5K |
A2 | 2 | 20 | 8 | 3K |
Queries
Table Name | Operation Type | Num Clients | Throughput | Latency |
---|---|---|---|---|
A1 | JOIN on A1, B1 | 128 | 4K | 32ms |
A2 | JOIN on A2, B2 | 128 | 10K | 15ms |
A1 | PK QUERY on A1 | 128 | 14K | 6ms |
A2 | PK QUERY on A2 | 128 | 28K | 4ms |
One other potential benefit of packed columns could be reducing space amplification from us writing the key portion over and over again, for every column in the table, due to our usage of rocksdb.
Although in practice, we get some prefix compression benefit, on keys, from rocksdb, so we'd need to test out disabling that, to understand if the space amplification is really a problem.
Packed Row format support landed.