isen-ng/testcontainers-dotnet

StartAsync() hangs when using Windows with wsl2 container

srollinet opened this issue · 8 comments

Hi,

I try to use TestContainers.Container.Database.MsSql on a Windows system with DockerDesktop running with wsl2 engine.

When calling StartAsync the call blocks then times out. The container is started, but it seems that the application still wait for something that does not happen.

Sometimes I have a timeout, sometimes I have an exception saying that the container is not found

Docker.DotNet.DockerContainerNotFoundException
Docker API responded with status code=NotFound, response={"message":"No such container: d63a51529be2fa36b01e69043ea31909c21f3e2c7a87fa6f836f03808954a4be"}

with docker ps, I see the container

CONTAINER ID   IMAGE                                        COMMAND                  CREATED          STATUS          PORTS                                         NAMES
d63a51529be2   quay.io/testcontainers/ryuk:0.2.3            "/app"                   28 seconds ago   Up 27 seconds   0.0.0.0:49159->8080/tcp, :::49159->8080/tcp   inspiring_poitras

First, I thought that it was an issue with ryuk, so I tried to disable it via REAPER_DISABLED env variable, but same thing happens with the mssql server container

I also tried to update to the last version of Docker.DotNet but I still have the same problem

My env is
Windows 10 pro 21H1
Docker desktop 4.0.1
.Net 5.0.40
TestContainers.Container.Database.MsSql 1.5.0
Docker.DotNet 3.125.5

The container configuration is

new ContainerBuilder<MsSqlContainer>()
   .ConfigureDockerImageName("mcr.microsoft.com/mssql/server:2019-latest")
   .ConfigureDatabaseConfiguration("not-used", "yourStrong(!)Password", "not-used")
   .Build()

Edit: the logs say that the container has started

dbug: TestContainers.Container.Abstractions.DockerClient.DockerClientFactory[0]
Testing provider: NpipeDockerClientProvider
dbug: TestContainers.Container.Abstractions.DockerClient.DockerClientFactory[0]
Provider[NpipeDockerClientProvider] found
local npipe: [npipe://./pipe/docker_engine]
dbug: TestContainers.Container.Abstractions.DockerClient.DockerClientFactory[0]
Testing provider: NpipeDockerClientProvider
dbug: TestContainers.Container.Abstractions.DockerClient.DockerClientFactory[0]
Provider[NpipeDockerClientProvider] found
local npipe: [npipe://./pipe/docker_engine]
dbug: TestContainers.Container.Database.MsSql.MsSqlContainer[0]
Adding session labels to generic container: 1b9e7108-34b0-4274-9045-3ec9d00a5620
dbug: TestContainers.Container.Database.MsSql.MsSqlContainer[0]
Starting reaper ...
dbug: TestContainers.Container.Abstractions.Reaper.ResourceReaper[0]
Starting ryuk container ...
dbug: TestContainers.Container.Abstractions.Images.GenericImage[0]
Image already exists, not pulling: quay.io/testcontainers/ryuk:0.2.3
dbug: TestContainers.Container.Abstractions.Reaper.RyukContainer[0]
Creating container for image: quay.io/testcontainers/ryuk:0.2.3
dbug: TestContainers.Container.Abstractions.Reaper.RyukContainer[0]
Starting container with id: d933bcad92376dd67c89675e3c3883df855885de00e31e2244a9eaae750c546c
dbug: TestContainers.Container.Abstractions.Reaper.RyukContainer[0]
Container quay.io/testcontainers/ryuk:0.2.3 startup complete
fail: TestContainers.Container.Abstractions.Reaper.RyukContainer[0]
Unable to start container services: quay.io/testcontainers/ryuk:0.2.3
TestContainers.Container.Abstractions.Exceptions.ContainerLaunchException: Container.Abstractions startup failed
---> Polly.Timeout.TimeoutRejectedException: The delegate executed asynchronously through TimeoutPolicy did not complete within the timeout.
---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Polly.Retry.AsyncRetryEngine.ImplementationAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates1 shouldRetryResultPredicates, Func5 onRetryAsync, Int32 pe rmittedRetryCount, IEnumerable1 sleepDurationsEnumerable, Func4 sleepDurationProvider, Boolean continueOnCapturedContext) at Polly.AsyncPolicy.ExecuteAsync(Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
at Polly.Wrap.AsyncPolicyWrapEngine.<>c__DisplayClass4_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Polly.AsyncPolicy.<>c__DisplayClass40_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Polly.Timeout.AsyncTimeoutEngine.ImplementationAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, Func2 timeoutProvider, TimeoutStrategy timeoutStrategy, Func5 onTimeoutAsync, Boolean continueOnCapturedContext) --- End of inner exception stack trace --- at Polly.Timeout.AsyncTimeoutEngine.ImplementationAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, Func2 timeoutProvider, TimeoutStrategy timeoutStrategy, Func5 onTimeoutAsync, Boolean continueOnCapturedContext)
at Polly.AsyncPolicy.ExecuteAsync(Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext) at Polly.Wrap.AsyncPolicyWrapEngine.ImplementationAsync(Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext, IAsyncPolicy outerPolicy, IAsyncPolicy innerPolicy)
at Polly.AsyncPolicy.ExecuteAsync(Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext) at Polly.AsyncPolicy.ExecuteAndCaptureAsync(Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
--- End of inner exception stack trace ---
at TestContainers.Container.Abstractions.WaitStrategies.ExposedPortsWaitStrategy.WaitUntil(IContainer container)
at TestContainers.Container.Abstractions.AbstractContainer.StartServices(CancellationToken ct)
Unhandled exception. Docker.DotNet.DockerContainerNotFoundException: Docker API responded with status code=NotFound, response={"message":"No such container: d933bcad92376dd67c89675e3c3883df855885de00e31e2244a9eaae750c546c"}

at Docker.DotNet.ContainerOperations.<>c.<.cctor>b__33_0(HttpStatusCode statusCode, String responseBody)
at Docker.DotNet.DockerClient.HandleIfErrorResponseAsync(HttpStatusCode statusCode, HttpResponseMessage response, IEnumerable1 handlers) at Docker.DotNet.DockerClient.MakeRequestForStreamAsync(IEnumerable1 errorHandlers, HttpMethod method, String path, IQueryString queryString, IRequestContent body, IDictionary`2 headers, TimeSpan timeout, CancellationToken token)
at TestContainers.Container.Abstractions.AbstractContainer.PrintContainerLogs(CancellationToken ct)
at TestContainers.Container.Abstractions.AbstractContainer.StartServices(CancellationToken ct)
at TestContainers.Container.Abstractions.AbstractContainer.StartAsync(CancellationToken ct)
at TestContainers.Container.Abstractions.Reaper.ResourceReaper.StartAsync(IDockerClient dockerClient, ILoggerFactory loggerFactory)
at TestContainers.Container.Abstractions.GenericContainer.ContainerStarting()
at TestContainers.Container.Abstractions.AbstractContainer.StartAsync(CancellationToken ct)
at TestContainer.Program.Main(String[] args) in C:\Users\sro\RiderProjects\Playground\TestContainer\Program.cs:line 20
at TestContainer.Program.

(String[] args)

Note: a colleague tried with more or less the same env, but it works as expected, so it's probably an issue with my env...
I'll let you know if I find the issue

Ok, the issue seem to be caused by port exclusion. I have a lot of TCP port excluded on my machine and it seem that the mapped port is chosen in this range

netsh interface ipv4 show excludedportrange protocol=tcp

Protocol tcp Port Exclusion Ranges

Start Port    End Port
----------    --------
      5700        5700
      5985        5985
      8884        8884
     47001       47001
     49152       49251
     49252       49351
     49553       49652
     49755       49854
     49855       49954
     50000       50059     *
     50060       50159
     50160       50259
     50260       50359
     50360       50459
     50772       50871
     50872       50971
     50972       51071
     51072       51171
     51172       51271
     51272       51371
     51372       51471
     51555       51654
     51750       51849     *
     51850       51949
     51950       52049
     52050       52149
     53286       53385
     53386       53485
     53486       53585
     53586       53685
     53686       53785
     53786       53885
     53886       53985
     53986       54085
     64142       64241
     64242       64341
     64342       64441
     64442       64541
     64542       64641
     64642       64741
     64742       64841
     64957       65056
     65057       65156
     65157       65256
     65257       65356
     65357       65456

When tried, the chosen mapped port was 49173, which is in an excluded range.

The strange thing is that if I try to map this port when starting a docker image with the CLI, I have an exception and the container cannot start. So I don't know how docker.dotnet manage to start it

What is happening is that your MsSql container dies after being started. You suspicion is probably correct to say that maybe the exposed port is blocked.

What you can do is this:

  1. when the unit test is starting the container,
  2. type docker ps -a
  3. this will list all containers, including the MsSql container that was started, but already died.
  4. docker logs <your dead mssql container id or name> to inspect it's logs

from the logs, you might find why the container is dead

The problem is that the container does not die in fact. For example for ryuk it starts, The log is something like 'Waiting for connexion on port 8080'

docker ps reports that the port 8080 is mapped to port xxx, which is in an excluded range, but in fact the port is not open, at least not in Windows (probably it is open in the linux VM that hosts docker)

And finally, the test container timeouts because the mapped port is not reachable.

I haven't found how the random(?) mapped port is chosen. Is it done by testcontainers or by docker.dotnet?

Oh you mean TestContainers timeout when waiting to connect to ryku!
Ok i got it.

The random port is not selected by TestContainer or Docker.Dotnet.
It is selected by the docker engine.

The docker engine maps 8080 to a random port in the higher port range (> 32000). Now, I'm not sure how the docker engine does it, but my guess is to map it to port 0 and let the OS pick a port in the high port ranges.

Ok, so lot of people have the same problem docker/for-win#3171 (comment) / docker/for-win#11584
and nothing related to testcontainers.

Do you know if it is possible to define a port explicitly, to workaround this issue?

Unfortunately, there isn't currently a way to map it. You can, however, disable the reaper for now

Yes, but I have the same problem with the SQL container, it also takes an excluded port and TestContainers timeouts when trying to connect. But I've seen that the port binding is configurable for the DB, so everything is ok :)