RDBMS ingestion process can miss updates
banderous opened this issue · 0 comments
This bug concerns the RDBMS ingestion process as described in the official docs
The suggested approach is based on tracking row modification time in the sql_last_value of the jdbc plugin and using it to find any rows modified since the previous logstash query.
For this to be correct would require every changed row's modification_time to be >= the current max modification_time but this is not the case - a row's modification_time is the time the row was written and not the time the database transaction committed - which could be significantly later.
It is therefore possible for a row to be committed with a modification time lower than the current maximum modification time and never get indexed since it's modification_time will be < sql_last_value.
Steps to reproduce:
I was able to reproduce this by following the linked tutorial and using two different database connections (C1 & C2):
C1 - begin; INSERT INTO es_table (id, client_name) VALUES (4,"Missing!"); -- Never gets indexed
C2 - INSERT INTO es_table (id, client_name) VALUES (5,"latest"); -- Gets a higher modification_time, indexed
-- wait for logstash query to run
C1 - commit;
The row inserted on C1 never gets indexed since when committed it has a lower modification_time than the row committed on C2.