tpeczek/Lib.AspNetCore.ServerSentEvents

Race Condition when managing the Groups collection

Closed this issue · 2 comments

It looks like reading/writing to the Groups collection in ServerSentEventsService is not properly synchronized/thread-safe. Although writing is properly synchronised with a semaphore, it is still possible for other threads to be reading & enumerating the dictionary during a write, which causes them to throw, eg on line this: https://github.com/tpeczek/Lib.AspNetCore.ServerSentEvents/blob/main/Lib.AspNetCore.ServerSentEvents/ServerSentEventsService.cs#L392.

In the real world this can happen when eg a client is disconnecting, while at the same time another client is connecting and a new group is being registered for them. The thread handling the disconnect will see a System.InvalidOperationException: Collection was modified; enumeration operation may not execute. exception while trying to clean up the resources for that client. I believe this will then result in a memory leak for that client since a reference to them will remain in the groups collection.

Heres a super contrived reproduction application:

Program.cs

using System;
using System.Linq;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Lib.AspNetCore.ServerSentEvents;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Options;

namespace SSERaceConditionDemo
{
    public class Program
    {
        public static async Task Main(string[] args)
        {
            var host = Host.CreateDefaultBuilder(args)
                .ConfigureWebHostDefaults(webBuilder => webBuilder.UseStartup<Startup>());

            var webHostTask = host.Build().RunAsync();

            var httpClient = new HttpClient();
            var random = new Random();

            // Create 10 clients which constantly connect & disconnect from the service
            var tasks = Enumerable.Range(0, 10)
                .Select(i =>
                {
                    return Task.Run(async () =>
                    {
                        while (true)
                        {
                            var timeout = random.Next(100, 200);
                            try
                            {
                                await httpClient.GetAsync("http://localhost:5000/", new CancellationTokenSource(timeout).Token);
                            }
                            catch (OperationCanceledException)
                            {
                            }
                        }
                    });
                }).ToArray();

            await webHostTask;
        }


        public class Startup
        {
            public void ConfigureServices(IServiceCollection services)
            {
                services.AddAuthentication();
                services.AddAuthorization();
                services.AddSingleton<DemoServerSentEventsService>();
            }
            
            public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
            {
                app.UseAuthentication();
                app.UseAuthorization();
                app.UseMiddleware<ServerSentEventsMiddleware<DemoServerSentEventsService>>();
            }
        }

        public class DemoServerSentEventsService : ServerSentEventsService
        {
            public DemoServerSentEventsService(IOptions<ServerSentEventsServiceOptions<ServerSentEventsService>> options) : base(options)
            {
            }

            public override async Task OnConnectAsync(HttpRequest request, IServerSentEventsClient client)
            {
                // Add the client to a silly number of groups - This makes enumeration take longer during DC's, increasing the chance of the race condition.
                for (var i = 0; i < 10000; i++)
                {
                    await AddToGroupAsync(Guid.NewGuid().ToString(), client);
                }

                await base.OnConnectAsync(request, client);
            }
        }
    }
}

Csproj:

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    <TargetFramework>net5.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Lib.AspNetCore.ServerSentEvents" Version="6.0.0" />
  </ItemGroup>

</Project>

Run this in release mode and it'll spew the exception repeatedly.

I think the simplest solution is to simply swap out the Dictionary with a ConcurrentDictionary and drop the Semaphore for synchronising it since it'll be unnecessary

Thank you for bringing this up with so many details. I'll take a look as soon as possible (probably next week).

Looks like I was able to distill the problem into an unit test.

Indeed it looks like using ConcurrentDictionary will be simplest, I just need to make sure I avoid unnecessary internal locking.