FileDistributedLock is not thread safe
Invenisso opened this issue · 6 comments
In heavy multithreading scenario (>10 parallel threads) 84. line throws System.UnauthorizedAccessException which is not properly caught.
lockFileStream = new FileStream(this.Name, FileMode.OpenOrCreate, FileAccess.Read, FileShare.None, bufferSize: 1, FileOptions.DeleteOnClose);
Stack trace:
System.UnauthorizedAccessException
HResult=0x80070005
Message=Access to the path 'C:\Users\adamp\AppData\Local\Temp\execute_workflow_definitionZDIA7GR5FQLVK2GQ6WQSLQLW3FV3XHCL.lock' is denied.
Source=System.Private.CoreLib
StackTrace:
at System.IO.FileStream.ValidateFileHandle(SafeFileHandle fileHandle)
at System.IO.FileStream.CreateFileOpenHandle(FileMode mode, FileShare share, FileOptions options)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at Medallion.Threading.FileSystem.FileDistributedLock.TryAcquire(CancellationToken cancellationToken) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.FileSystem\FileDistributedLock.cs:line 84
at Medallion.Threading.FileSystem.FileDistributedLock.<>c.<Medallion.Threading.Internal.IInternalDistributedLock<Medallion.Threading.FileSystem.FileDistributedLockHandle>.InternalTryAcquireAsync>b__10_0(FileDistributedLock this, CancellationToken token) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.FileSystem\FileDistributedLock.cs:line 58
at Medallion.Threading.Internal.BusyWaitHelper.<WaitAsync>d__0`2.MoveNext() in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\BusyWaitHelper.cs:line 54
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Threading.Tasks.ValueTask`1.get_Result()
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
at Medallion.Threading.Internal.Helpers.<Convert>d__1`2.MoveNext() in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Helpers.cs:line 24
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Threading.Tasks.ValueTask`1.get_Result()
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
The exception is later caught, but only if a directory with the same name as the file already exists. You also need to catch this exception without any condition and return null as the result.
(Tested on Windows 10.)
Thanks for reporting @Invenisso . I have a related but distinct issue filed here: #106.
One thing I'm struggling with is that (afaik) there are times when we can hit UnauthorizedAccessException
because of legitimate permissions issues that should be reported back to the caller as an exception (see https://docs.microsoft.com/en-us/dotnet/api/system.io.filestream.-ctor?view=net-5.0#System_IO_FileStream__ctor_System_String_System_IO_FileMode_System_IO_FileAccess_System_IO_FileShare_System_Int32_System_IO_FileOptions_).
I want to avoid a case where we legitimately don't have permission to the lock file but we behave as though we just couldn't acquire it.
One thought I have is that UnauthorizedAccessException
should trigger an immediate retry, but only up to N times. After N, we assume that this is a real permissions failure. Obviously this still isn't perfect because it could still fail in a heavy multi-threading scenario where locks are rapidly being taken and released.
Do you have a snippet of code that easily reproduces this for you? What version of .NET are you running on?
Ok I have a test which repros this reliably:
[Test]
public void TestHeavyParallelism()
{
var directory = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());
var @lock = new FileDistributedLock(new DirectoryInfo(directory), name: Guid.NewGuid().ToString());
const int TaskCount = 15;
const int AcquireCount = 500;
const double DeleteDirectoryProbability = .1;
using var barrier = new Barrier(TaskCount);
var tasks = Enumerable.Range(0, TaskCount)
.Select(i => Task.Run(() =>
{
barrier.SignalAndWait();
var random = new Random(i);
for (var i = 0; i < AcquireCount; ++i)
{
@lock.Acquire().Dispose();
if (random.NextDouble() < DeleteDirectoryProbability)
{
try { Directory.Delete(directory); }
catch { }
}
}
}))
.ToArray();
Assert.DoesNotThrow(() => Task.WaitAll(tasks));
}
@Invenisso I've published a prerelease version of DistributedLock.FileSystem containing what I believe to be a fix. Would you mind trying it out in your environment and letting me know if it resolves the issues you were seeing? Thanks!
https://www.nuget.org/packages/DistributedLock.FileSystem/1.0.1-rc01
@madelson Thank you for taking care of the problem! I will check your fix as soon as I get back to work on monday, because now I don't have access to my environment.
I can confirm that the error is no longer present when I use the new package. Thanks for fixing. :-)
Released in DistributedLock.FileSystem 1.0.1 / DistributedLock 2.3.0