Time slice index files before lookup
Ulrhol opened this issue · 1 comments
Today when using before/after criteria they get added to list of lookups as timeQuery functions. When a lookup is performed all files on a thread is sorted and then scanned for occurence of matching objects. Every lookup is performed on each index file separately and takes a certain amount of milliseconds. Time queries can skip a specific file but all non-time queries are performed on each index file regardless of if before/after criteria are set. This means a lot of lookups are performed in vain, the results will get discarded anyway at the end.
If instead before/after criteria are added as properties of the query the thread can filter the list of files to scan before performing a lookup. This would take away a lot of unecessary I/O and scanning of index files that are outside of the time frame queried for, especially when you're reading Gbit+ bandwidth and a couple of days or weeks data on disk requirement.
So, instead of getting the full list of files when performing search:
for _, file := range t.getSortedFiles() { files = append(files, t.files[file]) }
There should be a function to slice out only the list files that are of interest:
for _, file := range t.getSortedFilesSlice(q) { files = append(files, t.files[file]) }
And have getSortedFilesSlice function returning only files that contain data between AFTER and BEFORE if any or both are set as criteria on the query. Similar to how the timeQuery applies a[0] (after) and a[1] (before) but applied to filter out index files to perform lookup on.
Have run into the same issue.