real time indexing
karlicoss opened this issue · 6 comments
E.g. something inotify based. That would make the implementation quite a bit more complext that it's at the moment.
Also due to the nature of many exports (periodic), it won't be realtime unless the underlying exports are realtime.
Still it could at least detect source files changes, etc.
Also would work well in conjunction with Grasp.
Might need to be careful about closing libmagic #124 (comment)
Relevant: i've implemented 'almost realtime' indexing recently:
promnesia/src/promnesia/dump.py
Line 18 in c442081
E.g. you can have a separate config file only with your text notes (which should be indexed very fast). Then if you run
PROMNESIA_INDEX_POLICY=update promnesia index --config /path/to/small/config
, it will merge it into the main database.
That means you can run it very often (e.g. every five minutes), or potentially combine with entr
to achieve 'realtime' indexing..
The last comment here needs to make it into main docs.
Even better, if a new option is added like promnesia index --update
so that the above preserves existing items in server's database:
promnesia index --update --config <small-config> --secrets <secret-file>
But what about de-duplication? Are there any issues with updates?
Yep, good idea to pass it in cmdline args! It was somewhat experimental at first, so I made it an env variable, but it seems to work pretty well (apart from one minor race condition I might need to fix first).
Maybe even it makes sense to make --update
mode the default? I guess the worst that would happen is some stale entries would be in the database -- then if the user notices them, they can do a full reindex manually.
Regarding deduplication -- not sure what do you mean?
This is how it works at the moment
promnesia/src/promnesia/dump.py
Lines 61 to 79 in e3b21cb
So it clears all the entries corresponding to the data source first and then inserts them. Hopefully shouldn't result in duplication!
hmm seems that it was closed automatically by github -- we don't really have realtime indexing yet, so I'll reopen
Perhaps for actual 'realtime' this would need proper HPI support.
E.g. HPI module exposes a generator or something, which Promnesia can poll on (presumably, in a loop over all promnesia sources).
Not sure how easy it'll be to make it asynchronous enough though, and also going to be tricky to 'expire' stale Visits, but could work well for incremental/synthetic sources (which typically are the most expensive computationally)