Feature: Incremental database update
davidmezzetti opened this issue · 0 comments
davidmezzetti commented
Currently, ETL processes assume operations are a full database reload each run. This works well for smaller datasets but for larger datasets, it's inefficient.
Add the ability to set the path to an existing database and copy unmodified records from the existing source. This way only new/updated records are processed each run.
SQLite needs a system for reading and inserting articles/sections from another database.
Elasticsearch already handles most of this, just needs a small change to only create the articles index if it doesn't already exist. Merges will be handled by Elasticsearch based on the article id.