microsoft/Windows-Containers

Process Isolation is very slow as compared to HyperV Containers on Server 2019

saraf-akshay opened this issue · 11 comments

Describe the bug
Slowness in cloning source when running multiple containers simultaneously in process isolation.

Isolation Mode Time in Git clone Containers running in parallel Comments
Process 9 mins 1
HyperV 8.5 mins 1
Process 21 mins 10 <-- This is the problem
HyperV 11 mins 10

As the number of containers increases on the server, the performance of container slows down significantly but only in process isolation. I am not worried about minor performance differences. The same also happens when I compile in these containers using nmake. The performance degrades in process isolation.

These 10 containers I mentioned above are triggered by a Jenkins pipeline using Kubernetes. Here is the yaml code I used:

apiVersion: v1
kind: Pod
spec:
  tolerations:
  - effect: NoSchedule
    key: custom/build-hosts
    operator: Exists
  containers:
  - name: jnlp
    image: <image link redacted>
    command:
    - powershell
    args:
    - cp -R C:\\privconf\\*  C:\\Users\\ContainerAdministrator;
    - C:\\jenkinsscript\\jenkins.ps1
    resources:
      limits:
        cpu: 12
        memory: 16Gi
      requests:
        cpu: 12
        memory: 16Gi
    env:
    - name: MY_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: MY_HOST_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    volumeMounts:
    - mountPath: /privconf
      name: credential-volume
    - mountPath: /gitcache
      name: cache-volume
    - mountPath: /jenkinsscript
      name: jenkins-script
  volumes:
  - hostPath:
      path: D:/agentconf
      type: ""
    name: credential-volume
  - hostPath:
      path: D:/agentcache
      type: ""
    name: cache-volume
  - configMap:
      defaultMode: 420
      name: jenkins-script
    name: jenkins-script
  nodeSelector:
    custom/fcds: test_akshay

The HyperV Data was gathered using Docker Swarm, as K8S doesn't support HyperV Isolation.

dockerSwarm {
    label "docker-agent"
    image "<image link redacted>"
    limitsNanoCPUs 12000000000
    limitsMemoryBytes 17179860384
    reservationsNanoCPUs 12000000000
    reservationsMemoryBytes 17179860384
}

The physical host that I ran it on is a bare metal server, with 208 logical cores (104 physical cores) after Hyperthreading enabled.

To Reproduce
Please trigger 10 parallel containers on the same host at the exact same time, cloning the exact same repository, and that way you should be able to reproduce the issue.

Expected behavior
The expectation is for Process Isolation to work on par or better than HyperV Isolation.

Configuration:

  • Edition: Windows Server 2019
  • Base Image being used: jenkins/inbound-agent:3107.v665000b_51092-7-jdk11-windowsservercore-ltsc2019
  • Container engine: Docker
  • Container Engine version:
Client:
Version:           25.0.0
API version:       1.44
Go version:        go1.21.6
Git commit:        e758fe5
Built:             Thu Jan 18 17:10:49 2024
OS/Arch:           windows/amd64
Context:           default

Server: Docker Engine - Community
Engine:
 Version:          25.0.0
 API version:      1.44 (minimum version 1.24)
 Go version:       go1.21.6
 Git commit:       615dfdf
 Built:            Thu Jan 18 17:09:34 2024
 OS/Arch:          windows/amd64
 Experimental:     false

Additional context

I have verified that there is no resource over provisioning and my Windows defender is disabled, and all my processes (including git and git-lfs) and directories where source code is checked out are part of exclusion list. As mentioned here: #149
Also verified I have the Defender fix, which was released here: #345

Hey @saraf-akshay, could you share what you're seeing with Windows Server 2022 process isolation?

We don't ship OS level fixes anymore for Windows Server 2019 because it is now out of mainstream support (only address security fixes): https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2019

@fady-azmy-msft : Thanks for your response. I'm working on preparing a server with Server 2022. It might take a couple days. I'll keep you posted.

@fady-azmy-msft ,@ntrappe-msft : There is still slowness.

Server 2022 is a lot better than Server 2019. Server 2019 was 2x slower, whereas Server 2022 is 1.25x slower in Process Isolation as compared to HyperV Isolation when I run 10 containers in parallel on a host, (essentially trying to run host at its full capacity) with resource (CPU and Memory) restriction as showed in my first comment's yaml file.

Here is what I have experienced with process isolation compared to Hyper-V isolation. I have seen cascading container failures and even containers that crash and cannot recover EVER they have to be redeployed. The performance is night and day better on my SHIR containers now with Hyper-V isolation.

Host Running 2019 DC
Container 2019 core latest

Azure/Azure-Data-Factory-Integration-Runtime-in-Windows-Container#7

Hello @Howard-Haiyang-Hao @fady-azmy-msft @ntrappe-msft
Just checking in, Any update on this?

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

Im now running 80+ SHIR containers with hyper-v isolation successfully with little to no issues. Without hyper isolation the max that I could run was about 25+- and that also created issues that cause the container to completely corrupt its self at random. Please make a Linux compatible SHIR application for ADF / Synapse!