Findwise/Hydra

Nullpointer in the Memory Cache in hydra 0.4.0

Opened this issue · 3 comments

Getting a Nullpointer in the Memory Cache in hydra 0.4.0. Don't really know where to start dubugging this..

2014-01-15 17:02:30,064 [Thread-4] ERROR com.findwise.hydra.Main - Got an uncaught exception. Shutting down Hydra
java.lang.NullPointerException: null
    at com.findwise.hydra.MemoryCache.removeStale(MemoryCache.java:183) ~[hydra-core.jar:na]
    at com.findwise.hydra.CachingDocumentNIO.flush(CachingDocumentNIO.java:372) ~[hydra-core.jar:na]
    at com.findwise.hydra.CachingDocumentNIO$CacheMonitor.run(CachingDocumentNIO.java:424) ~[hydra-core.jar:na]
2014-01-15 17:02:30,064 [Thread-4] INFO com.findwise.hydra.Main - Got shutdown request...

Maybe it's in combination with discarding documents that this fails..?

Or outputting rather, I didn't have a discarding stage in that pipeline.

Is the document removed from the memory cache if it was outputed already?

Hm, just some thinking without debugging:
So the relevant line is https://github.com/Findwise/Hydra/blob/0.4.0/database/src/main/java/com/findwise/hydra/MemoryCache.java#L183

Entry<DocumentID<T>, Long> entry = it.next();
if (time - entry.getValue() > stalerThanMs) {
    DatabaseDocument<T> d = getDocumentById(entry.getKey());
    list.add(d);
    map.remove(d.getID());      <- there
    it.remove();
}

This is all synchronized on the MemoryCache instance. The iterator it is over the entire cache and gives the time they were last touched. It looks like the entry in the iterator either doesn't exist in the cache or has no ID. Since the key in the entry is the document ID, it's probably the case that the document is no longer in the cache.

Outputting a document marks it as processed, using this method:
https://github.com/Findwise/Hydra/blob/0.4.0/database/src/main/java/com/findwise/hydra/CachingDocumentNIO.java#L121

public boolean markProcessed(DatabaseDocument<T> d, String stage) {
        DatabaseDocument<T> cached = cache.getDocumentById(d.getID());
        if (cached != null) {
                d.putAll(cached);
                cache.remove(d.getID());
        }
        if (writer.markProcessed(d, stage)) {
                return true;
        }
        return false;
}

So documents that are marked as processed should be removed from the cache. But then it shouldn't be there in the iterator for documents that are going to be flushed, anyway.

Do you have any more information about the pipeline, and if there is any condition for triggering this?