sci-visus/OpenVisus

Lock File Triggers an Infinite Loop

klacansky opened this issue · 4 comments

A running query can leave lock file if it has been interrupted, such as restarting notebook kernel. Currently, the code keeps printing in the log that the lock file exists forever. We need a better way to inform the user about the problem. I suggest we either throw an exception with clear error message or, as Valerio suggested, delete the lock file and corresponding block file and try again (may be problematic if two processes access the file).

Good point. The code is retrying for some minutes (if it's endless waiting, then it is wrong behavior). Could you check and propose a pull request?

I looked through the code and did not find any check to stop after certain time or number of tries. I create a pull request, and I am not sure it breaks any workflows.

what is the right behavior? raise an Exception or return false?

Looked into raising an exception. The issue is the requests run on threads, and we need a mechanism to report a thread failure to the main thread. Similar problem can happen if unlock throws an exception.

I tried to modify MultiplexAccess.cpp to catch exception in the runInBackground function and store it in a shared pointer. However, the main thread does not check this shared pointer so I can't easily rethrow the exception. I tried to rethrow the exception in the destructor, and is is too late and the whole thing spins in an infinite loop. There are also multiple places where this can happen.

What do you think is the right approach? You have a better understanding of the whole system. Personally, I would prefer to have two queues, one for TODO work and one for results + error codes. It is challenging to understand the async code.