datayoga-io/datayoga

Enhanced Producer State Management for Incremental Data Retrieval

spicy-sauce opened this issue · 0 comments

In our system, we require the ability to specify a specific field such as last_updated, rowid, rownum, or any other relevant indicator. This designated field will serve as a reference point, allowing us to fetch only new records since the last iteration. The value of this indicator will be persistently stored in a file, Redis, or another storage solution (initially starting with a file).

If the Producer cannot locate this indicator value, it will fetch all available data. Subsequently, the maximum value of the indicator will be stored in the designated file. During subsequent iterations, the system will read this value either from the file or from memory (if available) and query for records where the specified field is greater than the stored indicator value. This approach ensures efficient and incremental data retrieval, optimizing our system's performance and responsiveness.