forTEXT/catma

Lazy loading and unloading graph index

Closed this issue · 1 comments

The analysis and visualization module relies on a graph index of Documents and Annotations. Currently this index is built up front with all Documents and Annotations when opening a CATMA Project. In addition to the problems described in #303 and #304 this adds to bad performance when opening a CATMA Project and cause out of memory errors on larger Projects with lots of text and/or lots of Annotations. A lazy loading of Documents and Annotatios which holds only parts of the data in memory was already an idea when working on CATMA 6 but didn't make it into the release.

Currently we use TinkerGraph as an implementation for the graph index. Recent performance measurements showed that the Guava graph implementation is up to two times faster and it also seems easier to unload data currently not needed.

The Tinkergraph implementation will be replaced by a lazy loading Guava graph implementation that also supports unload of data currently not used.

Changes outlined above have now been released with 7.0.0

Ultimately we didn't end up using any graph implementation but rather just POJOs with a heavier reliance on the caching mechanisms provided by Google's Guava libraries. See for example /src/main/java/de/catma/repository/git/graph/lazy/LazyGraphProjectHandler.java and /src/main/java/de/catma/repository/git/graph/lazy/LazyGraphProjectIndexer.java