dice-group/gerbil

Worker threads get stuck waiting for Semaphore

Closed this issue · 1 comments

Error description

All worker threads seem to be stuck while preparing the datasets or doing some post processing of the received annotator results.

The stack traces of the workers:

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-1 (NIF WS)","D-1 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=100.0% of dataset
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:50)
org.aksw.gerbil.execute.ExperimentTask.prepareAnnotatorResults(ExperimentTask.java:229)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:330)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:143)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-2 (NIF WS)","D-2 (uploaded)","RE","STRONG_ENTITY_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMarkings(SameAsRetrieverUtils.java:31)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:75)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-1 (NIF WS)","D-3 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMarkings(SameAsRetrieverUtils.java:31)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:75)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-3 (NIF WS)","D-2 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=100.0% of dataset
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:50)
org.aksw.gerbil.execute.ExperimentTask.prepareAnnotatorResults(ExperimentTask.java:229)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:330)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:143)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","ACE2004","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMarkings(SameAsRetrieverUtils.java:31)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:75)
org.aksw.gerbil.dataset.SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:50)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:50)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:104)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-3 (NIF WS)","D-4 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=100.0% of dataset
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:50)
org.aksw.gerbil.execute.ExperimentTask.prepareAnnotatorResults(ExperimentTask.java:229)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:330)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:143)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("A-1 (NIF WS)","D-4 (uploaded)","A2KB","WEAK_ANNOTATION_MATCH")
state=WAITING
progress=100.0% of dataset
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:50)
org.aksw.gerbil.execute.ExperimentTask.prepareAnnotatorResults(ExperimentTask.java:229)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:330)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:143)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

Summary of stack traces

  • XXX is one of the in-build annotator
  • A-x is a NIF-based webservice where x denotes an ID
  • D-x is an uploaded dataset
Thread ID Annotator Dataset Progress Pos. of thread
1 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
2 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
3 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
4 A-1 D-1 100% FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
5 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
6 A-2 D-2 null FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
7 XXX ACE2004 null SingletonDatasetConfigImpl.getPreparedDataset(SingletonDatasetConfigImpl.java:47)
8 A-1 D-3 null FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
9 A-3 D-2 100% FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
10 XXX ACE2004 null FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
11 A-3 D-4 100% FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)
12 A-1 D-4 100% FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:115)

It looks like threads 1, 2, 3, 4 and 7 wait for thread 10 to finish the initialisation of the ACE2004 dataset. This seems to be fine. However, the threads 4, 6, 8, 9, 10, 11 and 12 seem to wait to get access to the FileBasedCachingSameAsRetriever and it is unclear which thread has not released the Semaphore, before.

There is no Exception in the logs that seems to be related to that. GERBIL is configured to use 12 worker threads and all of them are still alive - so no thread crashed.

Proposed solution

It should be checked whether the usage of the Semaphore class is really necessary. Especially within the FileBasedCachingSameAsRetriever other, safer methods could be useful.

The proposed solution above does not work in this case, since the class makes use of two Semaphores

    private static final int MAX_CONCURRENT_READERS = 1000;

    private Semaphore cacheReadMutex = new Semaphore(MAX_CONCURRENT_READERS);
    private Semaphore cacheWriteMutex = new Semaphore(1);

In line 115, the cacheReadMutex is aquired and later on, released in line 169.

        try {
            cacheReadMutex.acquire();
        } catch (InterruptedException e) {
            LOGGER.error("Exception while waiting for read mutex. Returning null.", e);
            return null;
        }
        ...
        cacheReadMutex.release();
        return result;

The issue is caused by this part of the code being not covered by a try-finally construct. Because of that problem, the 1000 available read permits were lost throughout the time GERBIL was running. As soon as all of them were lost, the service got stuck.

Proposed solution

  • Fix this part of the class using try-finally.
  • Go through all other classes using Semaphores and check them for the same issue.