logstash-plugins/logstash-integration-jdbc

RDBMS ingestion process can miss updates

banderous opened this issue · 0 comments

This bug concerns the RDBMS ingestion process as described in the official docs

The suggested approach is based on tracking row modification time in the sql_last_value of the jdbc plugin and using it to find any rows modified since the previous logstash query.

For this to be correct would require every changed row's modification_time to be >= the current max modification_time but this is not the case - a row's modification_time is the time the row was written and not the time the database transaction committed - which could be significantly later.

It is therefore possible for a row to be committed with a modification time lower than the current maximum modification time and never get indexed since it's modification_time will be < sql_last_value.

Steps to reproduce:

I was able to reproduce this by following the linked tutorial and using two different database connections (C1 & C2):

C1 - begin; INSERT INTO es_table (id, client_name) VALUES (4,"Missing!"); -- Never gets indexed

C2 - INSERT INTO es_table (id, client_name) VALUES (5,"latest"); -- Gets a higher modification_time, indexed
  -- wait for logstash query to run
C1 - commit;

The row inserted on C1 never gets indexed since when committed it has a lower modification_time than the row committed on C2.