orlangure/gnomock

Bug: Nearly impossible to debug failing containers

Haegi opened this issue · 12 comments

Haegi commented

Describe the bug
When trying to start a custom container and then it fails, it is nearly impossible to debug what happened.
This is mainly due to two reasons:

  1. Container is not shown in any history (e.g. docker ps -a)
  2. Logs are also not there anymore. Neither using docker logs, nor when trying to get the logs from the /var/lib/docker/containers/ directory.

Also options like WithDisableAutoCleanup() or WithDebugMode() don't seem to change anything.

To Reproduce
There are a lot of options to reproduce. For example the code below will fail with Error running container: can't start container: container network isn't ready: can't inspect container xxx: Error: No such container: xxx. But there is no indication on what really happened.

_, err := gnomock.StartCustom("busybox", gnomock.DefaultTCP(8080))
if err != nil {
	fmt.Printf("Error running container: %s", err)
}

Expected behavior
A proper error message is shown or at least it is possible to check the logs or the description of the failing container.

System (please complete the following information):

  • OS: MacOs
  • Version 12.4
  • Docker version: Docker Desktop 4.11.1 (84025)
  • gnomock version: v0.24.0
  • go version: 18.4 darwin/arm64

Additional context
Might be related to #8

Hi @Haegi and thank you for the report.
There is a number of ways to debug failing containers (which you correctly identified):

  1. WithDebugMode will print internal logs and prevent automatic container removal
  2. WithLogWriter will send internal container logs to the provided writer (such as os.Stdout) so you can save the logs regardless of final container state.

In case of busybox, the container you start exits immediately with code zero, so there is nothing to wait for and to use later, therefore the no such container error.

I can confirm though that the containers that exit successfully are removed regardless of WithDebugMode, and this probably should be fixed. If you can, maybe you could think about a proper solution? Otherwise I'll try to look into it during this or one of the following weekends. Thanks!

Haegi commented

Hi @orlangure , thanks for the fast response and explanation.

Unfortunately, I am not using busybox but some custom container. The busybox was just some example.
But with the custom container the same happens.It exists immediately. But just in CI and not on my local machine. That means that the container exists for some reason but I don’t know why and it would be nice to at least see the logs.

Maybe the WithDebugMode or WithDisableAutoCleanup() should not cleanup successful containers. That would be already very helpful.
Some additional option would be maybe to forward the logs of the immediately exiting container. But I am not sure if this is easily possible because it seems like the code currently is first doing the waitForContainerNetwork/ContainerInspect before forwarding the logs. And then the container is already cleaned up.

@Haegi use WithLogWriter to see all the logs immediately as soon as they appear. If your container prints something, you'll see it.

Haegi commented

Maybe I am missing something but the following command doesn't show anything. But only Error running container: can't start container: container network isn't ready: can't inspect container xxx: Error: No such container: xxx

_, err := gnomock.StartCustom("busybox", gnomock.DefaultTCP(5000),
	gnomock.WithLogWriter(os.Stdout), gnomock.WithCommand("whoami"), gnomock.WithDebugMode())
if err != nil {
	fmt.Printf("Error running container: %s", err)
}

But docker run busybox whoami prints root.

Because docker run busybox prints nothing, so there are no logs.
Try to debug your actual container using WithLogWriter and see what it prints before exiting.

Haegi commented

Because docker run busybox prints nothing, so there are no logs. Try to debug your actual container using WithLogWriter and see what it prints before exiting.

I am sorry but docker run busybox whoami prints root for me. But I can also try it with the actual container but I expect the same (no log) result

@Haegi did you have any luck with WithLogWriter output of your real use case?

Haegi commented

Unfortunately, not. The container exits before the logging forwarding is even established.
It looks like the inspect in the waitForContainerNetwork is failing because the container is not there anymore. And I guess this triggers then the cleanup.
But maybe I am also missing something.

The reason most likely is the AutoRemove flag that we set on containers. They are deleted by docker when stopped. Containers that stop too quickly don't send their logs. I think the solution should be to not put this flag when WithDebugMode flag is set, but I'm not sure what would be the consequences of it. If you want, you can make this change in your local fork and use it until you figure out the reason, because it will take some time for me to get to this issue.

Haegi commented

I just tried it out and you are right. With AutoRemove=false the container is not removed and it can be inspected.
I would be open to contribute the change to not remove the container when WithDebugMode is specified. But I am also not sure about the consequences and maybe it would be safer to introduce a new gnomock option (e.g. WithDisableAutoRemove). What do you think?

I would rather go with fixing the behaviour of WithDebugMode. This options should not be used in stable builds, so this change should be safe. Would you like to open a pull request with this change?
Thanks.

Haegi commented

I would rather go with fixing the behaviour of WithDebugMode. This options should not be used in stable builds, so this change should be safe. Would you like to open a pull request with this change? Thanks.

Sure, I opened this PR for it: #757