Reassess harvest file storage and usage

Question

Reassess harvest file storage and usage

Closed this issue 10 years ago · 2 comments

GoogleCodeExporter commented 10 years ago

A number of issues here. Current state of affairs:

 * Harvest files are stored on disk, edited by sysadmins there, but not used from there.
 * Housekeeping periodically syncs this directory with storage.
 * Storage OID algorithm uses in-built system utility to combine hostname, username and file path. Causes migration issues.
 * Individual objects have references to their harvest file OIDs stored against them.
 * On first execution the Solr Indexer will cache an instantiated rules file in memory for performance.
 * There are multiple instantiated Indexers throughout the system. This setup allows for different versions of cached rules file claiming to be the same OID over time.
 * Once a rules file is cached, there is no point to synching it from disk as it will never be used until the system restarts.

Suggested changes:
 * Harvest files should use a different OID method... this is just to simplify migrations. Arguably they wouldn't even need to be stored in storage, but accessed direct from disk.
 * Indexer caches shouldn't be held unchanged for so long. Some periodic or automatic updates should exist.

Original issue reported on code.google.com by greg.pen...@gmail.com on 6 Sep 2011 at 1:01

Answer 1 · 2015-05-06T08:23:57.000Z

Original comment by greg.pen...@gmail.com on 23 May 2012 at 1:11

Added labels: Type-Task
Removed labels: Type-Defect

Answer 2 · 2015-05-06T08:23:57.000Z

Migrated to https://github.com/the-fascinator/the-fascinator/issues/7

Original comment by duncan.q...@gmail.com on 14 Jan 2013 at 5:19

Changed state: MovedToGitHub