yugabyte/yugabyte-db

[docdb] Pack columns in DocDB storage format for better performance

rkarthik007 opened this issue · 2 comments

Jira Link: DB-2216
Packing columns into a single RocksDB entry per row instead of one per column (as we do currently) improves YSQL performance. Below is a table of benchmark results using a JSONB column as a proxy for a packed representation on disk.

Setup

In this setup, there are two tables A and B (each with 128 columns). For each row in A, there are 4 rows in B. Consider the following two different way of creating A and B:

  1. A1 and B1 both have 128 column each.
  2. A2 and B2 have around 15 of the original columns each, and one additional column of type JSONB containing the key-value attributes of the remaining columns. This last column can be used to simulate a packed column representation.

Inserts

The following was observed when performing concurrent inserts from multiple clients.

Table Name Num Indexes Batch Size Num Clients Throughput
A1 0 0 32 1.5K
A2 0 0 32 2.8K
A1 2 20 8 1.5K
A2 2 20 8 3K

Queries

Table Name Operation Type Num Clients Throughput Latency
A1 JOIN on A1, B1 128 4K 32ms
A2 JOIN on A2, B2 128 10K 15ms
A1 PK QUERY on A1 128 14K 6ms
A2 PK QUERY on A2 128 28K 4ms

One other potential benefit of packed columns could be reducing space amplification from us writing the key portion over and over again, for every column in the table, due to our usage of rocksdb.

Although in practice, we get some prefix compression benefit, on keys, from rocksdb, so we'd need to test out disabling that, to understand if the space amplification is really a problem.

Packed Row format support landed.