docker/for-win

too much memory and endless docker stopping

nchj opened this issue · 30 comments

nchj commented
  • I have tried with the latest version of Docker Desktop
  • I have tried disabling enabled experimental features
  • I have uploaded Diagnostics
  • Diagnostics ID:A45EF8A3-94C0-47F1-882D-AA7598699931/20230207095829

Actual behavior

after updated to latest docker desktop, i noticed docker use a lot of memory (all my 32G of memory )after running a long time(no container running in it) ,before i set memory limit in wslconfig. and docker desktop sometimes has endless docker stooping after running about 12-24hours , i have to restart my windows to solve this.
i have tried to uninstall and reinstall docker desktop but not works.
now, i has limited memory in .wslconfig, memeory comsumption has reduced,but endless stopping also exist.

Expected behavior

use less memory , stop quickly

Information

  • Windows Version: win11 pro 22H2 build 22621.1105
  • Docker Desktop Version:4.16.3 (96739)
  • WSL2 or Hyper-V backend? wsl2
  • Are you running inside a virtualized Windows e.g. on a cloud server or a VM:no

Output of & "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

Steps to reproduce the behavior

  1. start docker desktop
  2. do nothing , wait about 12-24hour
  3. stop or restart docker, will see endless stopping

Same for me on Windows 10.0.19042 N/A Build 19042 (on physical laptop) with WSL2, and Docker Desktop 4.16.3 (96739). Also, memory is just consumed by no real actions doing in docker (image building, running conteiners, etc) - just by running it in background.

Same here, but I observe high memory usage in a few minutes already. Usually within a few hours the whole WSL service crashes and I'm unable to recover it. Only working solution is to do a Windows reboot.

nchj commented

Same here, but I observe high memory usage in a few minutes already. Usually within a few hours the whole WSL service crashes and I'm unable to recover it. Only working solution is to do a Windows reboot.

I my case , if not config memory limit in .wslconfig ,high memory usage will happen in a few time or a few hours( even if no other operation), if stop docker frequently, it can successfully stop, if not stop docker for 12-24hours or more ,docker will encounter endless stopping and at last crash ,Only reboot windows can help. so I think its the same problem

My solution so far has been to downgrade to 4.16.2 and issue this command from docker-desktop distro (wsl --distribution docker-desktop), if I'm noticing it using too much memory (it gets allocated as cache inside WSL):
echo 3 > /proc/sys/vm/drop_caches

Got the idea from here: microsoft/WSL#8725 (comment)

Be sure to back up your data from docker containers when uninstalling Docker desktop to revert to an earlier version (no way to simply downgrade).

@Mikk36 , how are you running wsl --distribution docker-desktop when I run it powershell just locks up.

@Mikk36 , how are you running wsl --distribution docker-desktop when I run it powershell just locks up.

image

Thanks, it works after I restarted my computer.
Wonder if I can run it on a cron

Vmmem ram usage is going down was already at 10GB after a day.

/etc/periodic/daily folder exists but maybe it needs to run more frequently than that.

cat /proc/meminfo Can view a bunch of ram amounts here, not sure if there is one that can be used to detect when this problem is happening.

grep MemTotal /proc/meminfo
An sh script could read values as needed and decide when to flush cached ram.

But are there any downsides to doing this too frequently?

https://unix.stackexchange.com/questions/17936/setting-proc-sys-vm-drop-caches-to-clear-cache

https://www.geeksforgeeks.org/sync-command-in-linux-with-examples/

The user may run
`sync' prior to writing to /proc/sys/vm/drop_caches.  This will minimize the
number of dirty objects on the system and create more candidates to be
dropped.
# Syncs number of dirty objects on the system to create more candidates to be dropped.
sync
# To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
# To free reclaimable slab objects (includes dentries and inodes):
echo 2 > /proc/sys/vm/drop_caches
# To free slab objects and pagecache:
echo 3 > /proc/sys/vm/drop_caches

There is also RamMap from Sysinternals if we want to get the other side on Windows too.
image

nchj commented

So docker has not officially fixed this problem so far, and after endless stops, I have uninstalled docker desktop and switched to Linux, waiting for the official fix

bpoxy commented

I've had a similar issue with every Docker Desktop version after 4.15.0.
Whenever a new version is released, I try it out to see if the issue has been resolved but no such luck yet (as of 4.18.0).

When my work intensive container (styler00dollar/vsgan_tensorrt) is started, Vmmem quickly grows until it consumes all 64 GB of RAM. To provide some context, Vmmem memory usage is ~9 GB when running this container with 4.15.0. Killing and deleting the container does not release the RAM and I have to run wsl --shutdown to reclaim it. In both case, Docker reports the container's RAM usage as ~5 GB so I'm not sure where all the RAM is going in versions beyond 4.15.0.

nchj commented

I've had a similar issue with every Docker Desktop version after 4.15.0. Whenever a new version is released, I try it out to see if the issue has been resolved but no such luck yet (as of 4.17.0).

When my work intensive container (styler00dollar/vsgan_tensorrt) is started, Vmmem quickly grows until it consumes all 64 GB of RAM. To provide some context, Vmmem memory usage is ~9 GB when running this container with 4.15.0. Killing and deleting the container does not release the RAM and I have to run wsl --shutdown to reclaim it. In both case, Docker reports the container's RAM usage as ~5 GB so I'm not sure where all the RAM is going in versions beyond 4.15.0.

now , my workaround is install docker in wsl2 rather than docker desktop, it works perfectly,and port of wls2 can be visited by windows directly by localhost:portnum, but devcontainer in vscode is not easy to use this way. maybe you can have a try.

is there any acknowledgement from devs on this? still having this problem. eating over 16gb ram and can't shut down.

nchj commented

is there any acknowledgement from devs on this? still having this problem. eating over 16gb ram and can't shut down.

The answer seems to be No .

I use docker installed in wsl2 instead

Hi @nchj, thanks for reporting the issue and apologies for the belated response on this.

We are actively looking into Docker Desktop WSL memory consumption to understand why it's so high for some users.

To help us root cause, could you provide the following info:

  • Amount of Physical RAM on your host. By default WSL will allocate up to 50% of the RAM to the WSL distros as needed, unless capped to a lower value in the .wslconf file.

  • Do you have any other WSL distros running (wsl -l -v)? (i.e., other than the docker-desktop and docker-desktop-data created by Docker Desktop). NOTE: All WSL distros share the same underlying Linux kernel instance, so mem consumed by one distro will affect all other distros.

  • Without any containers running, in the Windows task manager, what's the memory consumption of the vmmemWSL process.

  • Please provide the output of top within the Docker-Desktop VM, so we can see what's consuming memory in there.

$ docker run --rm --pid=host ubuntu top -b -n 1 -o +%MEM

Thanks!

nchj commented

Hi @nchj, thanks for reporting the issue and apologies for the belated response on this.

We are actively looking into Docker Desktop WSL memory consumption to understand why it's so high for some users.

To help us root cause, could you provide the following info:

* Amount of Physical RAM on your host. By default WSL will allocate up to 50% of the RAM to the WSL distros as needed, unless capped to a lower value in the .wslconf file.

* Do you have any other WSL distros running (wsl -l -v)? (i.e., other than the docker-desktop and docker-desktop-data created by Docker Desktop). NOTE: All WSL distros share the same underlying Linux kernel instance, so mem consumed by one distro will affect all other distros.

* Without any containers running, in the Windows task manager, what's the memory consumption of the vmmemWSL process.

* Please provide the output of top within the Docker-Desktop VM, so we can see what's consuming memory in there.
$ docker run --rm --pid=host ubuntu top -b -n 1 -o +%MEM

Thanks!

thanks,.
it is a long time ago, I installed docker in wsl2 directly instead, so i can not reproduce, the answer is based on my brain's memory.

Amount of Physical RAM on your host:32GB

Do you have any other WSL distros running :YES,ubuntu 20.04,BUT shutdown them can not help

what's the memory consumption of the vmmemWSL process: i remember it is almost 32GB

Please provide the output of top within the Docker-Desktop VM, so we can see what's consuming memory in there: it is a long time ago, maybe someone else commented in this issue or other issues can help. such as #13325

thanks

Thanks @nchj for the info.

Allow me to mark this as a duplicate of #13325. As I commented in that issue, the problem seems to be that inside the WSL VM, the Linux kernel page cache can eventually consume a large amount of memory that is never reclaimed by the Windows host, even if no Docker containers are running.

While we work on a solution, a couple of work arounds are:

  1. Remove unnecessary Docker images (e.g., docker image rm ...); this causes the Linux kernel inside the WSL VM to free kernel memory that caches those images. This in turn causes WSL to return that memory to the host Windows OS.

  2. Clear the page cache via echo 3 > /proc/sys/vm/drop_caches. Same as above, but clears all Linux kernel page caching.

One or both of these need to be done periodically as running new containers or building new images will cause the Linux kernel page cache in the WSL VM to consume mem again, therefore reducing host memory.

Duplicate of #13325. Closing.

Could the clearing page cache be ran as a cron job inside the docker wsl as a temporary measure?
Does it have a negative impact if it was ran every 2 hours for example?

Hi @LiamKarlMitchell,

Could the clearing page cache be ran as a cron job inside the docker wsl as a temporary measure?

As a temporary measure it's fine; the only bad effect is that it may reduce performance of subsequent docker run/build commands (as well as any other programs you run inside the WSL distro) since you are clearing the Linux kernel's page cache (i.e., the in-memory cache for disk accesses). But that's much better than running out of memory in the host :)

Does it have a negative impact if it was ran every 2 hours for example?

Ideally you would run it whenever you are running out of host memory and the WSL distros are idle. If the WSL distros are busy, running it won't mess up anything, but it will reduce performance.

Docker with hyper v works well for me, but you sacrifice ram being reserved.

anyone can confirm that docker desktop versions earlier than 4.15 does not have the memory leak bug ?

Hi @samuk190, I tried Docker Desktop 4.15 and compared it's mem consumption to 4.19, and while 4.15 is slightly more efficient than 4.19, I do not see any meaningful difference (see attached figure).

This is on Windows 11 Pro host, with 16GB of Physical RAM, and WSL2 version 1.2.5.0. The WSL VM has an 8GB (virtual) RAM.

dd-415-vs-419

Legend Description
Ubuntu Distro Only RAM usage when running a Ubuntu-22.04 WSL distro (Docker Desktop not yet running).
DD Idle Docker Desktop starts but it's idle.
DD Busy Docker Desktop busy (Elastic Search Docker compose workload).
DD Idle after busy Mem usage after stopping all containers.
DD Drop Kernel Cache Mem usage after running echo 3 > /proc/sys/vm/drop_caches inside the WSL distro.
DD Remove Container Images Mem usage after running docker image rm $(docker image ls -aq).

The slightly higher mem utilization in 4.19 is likely due to it using Docker buildx (for faster builds) compared to 4.15 which uses the traditional builder by default (need to double check this).

Also, the bulk of the memory usage when Docker is busy comes from the container workloads themselves. The Docker Engine and related processes consume a small portion of the WSL VM's memory ~15% (not shown).

We continue to work on improving DD memory usage, and as you can see from the figure, the kernel caches inside the WSL VM account for a significant portion of host RAM usage (i.e., notice the mem reduction when the caches get dropped). We are working on techniques to manage the kernel caches so as to reduce host memory usage as much as possible but without affecting performance.

Hope that helps!

@ctalledo Hi, I dont think your graph is right. First because you only mentions ram usage which I can assume you mean "Initial ram usage", your graph does not show time, only "idle" which means nothing at this point, if I do a docker-compose up here on a random image ofc it will use only "x ram" but after playing with the system for 1 hour, 2 hours, you can see ram slowly stacking up and going to a point where it uses all available host ram. It doesnt matter the version, it happens on any version I used both versions you provided and encountered the same issue I just asked if someone have tested it so I could give a try, but I gave the try anyways and I encountered same issue. I have collegues on work that use docker desktop windows and have exactly same problem with a clean image of Windows 11.
What I can do to prove my point is showing ram usage from hyper v mode which uses the same version 4.19 and ram usage much lower and no memory leak, Your graph says: "I'm using image x that takes 500Mb ram on version 4.15 and in version 4.19 it uses 523Mb"

But what I am encountering on real usage is: I have a image that need only 1.5GB Ram( thats the amount I set on hyper v engine with no problem), and overtime it uses 12GB RAM(on wsl2) and if i let it run for a long time eg: 5 hours, the docker container will crash because out of memory.

If this will not be fixed near soon, it should be removed from recommendation(WSL2 Engine Integration) on wizard setup because its misleading. How its possible a image that I can run with 1 vcpu and 1.5GB RAM with no performance issues on Docker desktop hyper v engine , and cannot run properly on docker desktop wsl2 engine even with 10GB+ Ram usage and 16 cores ( ryzen 9 3950x)

my docker image is simple, I have 5 containers with same docker images pointing to different folders to run microservices

RUN echo "America/Sao_Paulo" > /etc/timezone
FROM node:14.10.0

WORKDIR /Development

RUN npm install pm2 -g

RUN yarn install

RUN yarn build

EXPOSE 3000

(nodeJS express project with no cron/routines)

How this can use so much ram on wsl2 integration ? makes no sense. The problem with cleaning cache and maintaning consistent ram usage needs to be fixed urgently

Hi @samuk190, thanks for the feedback.

I've also noticed also that when DD on WSL starts and is left idle, the RAM usage grows. Or if you run a few containers, then remove them, the RAM usage increases over time. The figure I posted above does not show this.

In all cases however, I've noticed that the RAM increase is mainly due to the Linux kernel page cache inside the WSL VM growing over time. I've not seen any memory leakage in Docker Engine or it's related processes yet.

If you hit this issue again, please post the output of free from inside the WSL distro as I would be curious to see the memory distribution. This will help me ensure the solution we are building is the right one.

Thanks.

nonovd commented

Same here.

Host:

  • Windows 11 22H2 (22621.1848)
  • 64GB RAM
  • No .wslconfig file (default config)
  • Docker Desktop 4.20.1 (110738)

When DD is started and left idle, WSL VM RAM grows to 17GB in 5 minutes.

image
image

WSL:

image
image

When DD is not running:

image
image

This started happening after upgrading to the latest version. I shouldn't have upgraded ;(
Anyone know how to find out which version I was on prior to the update? Are there any logs or something i can look at?

Hi @nonovd, @iSplasher,

For a better experience with Docker Desktop on WSL, consider enabling the experimental WSL autoMemoryReclaim feature (available since WSL 1.3.10) by adding the following in your .wslconfig file:

[experimental]
autoMemoryReclaim=true

Then shutdown and restart WSL.

This will cause WSL to better reclaim unused memory inside the WSL VM and return it back to the Windows host.

Without this, the Linux kernel inside the WSL VM will not easily release memory back to the host since it does not know it's running inside a VM (and in particular, the kernel page cache will grow as Docker Desktop pulls and builds images or runs containers inside the WSL VM). In contrast, with autoMemoryReclaim enabled WSL will proactively return unused memory back to the host.

We recently updated the Docker Desktop docs to reflect this.

Hope that helps!

A small clarification to the last @ctalledo comment:

The autoMemoryReclaim setting is not bool setting. It is string (the docs).
So, true is invalid value here.

Available values are:

  • disabled (default).
  • gradual for the slow gradual release of cached memory (if CPU usage is continuously low for 5 minutes, reclaims a fixed portion of your VM’s memory size, which is calculated so that if your VM was full of cached memory it would go to zero cached memory after 30 minutes).
  • dropcache for instant release of cached memory.

Thanks @xak2000 for the clarification (initially it was a bool setting, later got changed to the string setting with the options you described).

Hi @bobloadmire,

Any reason to use gradual over dropped?

I would recommend "gradual" over "dropped", because it releases WSL memory back to the host, but does so with lesser impact on apps running inside the WSL distros. If some of those apps require high-mem consumption, then using "dropped" can significantly affect their performance all of the sudden (i.e., all of the memory for cached pages in the WSL's Linux kernel would suddenly be released back to the host machine).

On the other hand, "dropped" returns memory to the host machine right away, so if you need that memory to run host applications and care less about the impact to your WSL apps, then "dropped" may work better.

So it depends on your use case as you can see from the answer above. Hope that helps!