indices get corrupted if keys are too big
Opened this issue · 2 comments
CREATE TABLE IF NOT EXISTS T1 (B TEXT, C TEXT);
CREATE INDEX IF NOT EXISTS T1_idx on T1 (C,B);
INSERT INTO T1 (B, C) VALUES ('A', 'tms_doc_cfg');
INSERT INTO T1 (B, C) VALUES (RANDOMBLOB(17000), 'tms_doc_cfg_element_trans_xslt');
SELECT * FROM T1 WHERE C = "tms_doc_cfg";
returns nothing.
SELECT C,C="tms_doc_cfg" FROM T1 WHERE C > "tms_doc_cfg";
returns 1 in the 2nd column on 'A' .. The index clearly fails..
If the key length is reduced, the select
CREATE TABLE IF NOT EXISTS T1 (B TEXT, C TEXT);
CREATE INDEX IF NOT EXISTS T1_idx on T1 (C,B);
INSERT INTO T1 (B, C) VALUES ('A', 'tms_doc_cfg');
INSERT INTO T1 (B, C) VALUES (RANDOMBLOB(1000), 'tms_doc_cfg_element_trans_xslt');
SELECT * FROM T1 WHERE C = "tms_doc_cfg";
correctly returns 'A', 'tms_doc_cfg'.
The underlying problem is a known issue - LMDB keys must be small enough to fit on a single DB page. SQLite uses complete values verbatim as its index keys, so very long values didn't work. The latest SQLightning code attempts to workaround this limit by replacing long values with a hash, to save space. The current solution is incomplete, however. SQLite expects to be able to read the original values from the index key, and since LMDB is only returning the hash the index lookup isn't what SQLite expected. This is the same reason for issue #4 . I am going to close that as a dup of this issue.
The hash approach is obviously a dead end. The new approach will be to adopt an OpenLDAP-style nested index. Long keys will be broken into chunks and chained together. This will handle arbitrarily long keys and will preserve the ordering as well.