Process Isolation is very slow as compared to HyperV Containers on Server 2019
saraf-akshay opened this issue · 11 comments
Describe the bug
Slowness in cloning source when running multiple containers simultaneously in process isolation.
Isolation Mode | Time in Git clone | Containers running in parallel | Comments |
---|---|---|---|
Process | 9 mins | 1 | |
HyperV | 8.5 mins | 1 | |
Process | 21 mins | 10 | <-- This is the problem |
HyperV | 11 mins | 10 |
As the number of containers increases on the server, the performance of container slows down significantly but only in process isolation. I am not worried about minor performance differences. The same also happens when I compile in these containers using nmake. The performance degrades in process isolation.
These 10 containers I mentioned above are triggered by a Jenkins pipeline using Kubernetes. Here is the yaml code I used:
apiVersion: v1
kind: Pod
spec:
tolerations:
- effect: NoSchedule
key: custom/build-hosts
operator: Exists
containers:
- name: jnlp
image: <image link redacted>
command:
- powershell
args:
- cp -R C:\\privconf\\* C:\\Users\\ContainerAdministrator;
- C:\\jenkinsscript\\jenkins.ps1
resources:
limits:
cpu: 12
memory: 16Gi
requests:
cpu: 12
memory: 16Gi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- mountPath: /privconf
name: credential-volume
- mountPath: /gitcache
name: cache-volume
- mountPath: /jenkinsscript
name: jenkins-script
volumes:
- hostPath:
path: D:/agentconf
type: ""
name: credential-volume
- hostPath:
path: D:/agentcache
type: ""
name: cache-volume
- configMap:
defaultMode: 420
name: jenkins-script
name: jenkins-script
nodeSelector:
custom/fcds: test_akshay
The HyperV Data was gathered using Docker Swarm, as K8S doesn't support HyperV Isolation.
dockerSwarm {
label "docker-agent"
image "<image link redacted>"
limitsNanoCPUs 12000000000
limitsMemoryBytes 17179860384
reservationsNanoCPUs 12000000000
reservationsMemoryBytes 17179860384
}
The physical host that I ran it on is a bare metal server, with 208 logical cores (104 physical cores) after Hyperthreading enabled.
To Reproduce
Please trigger 10 parallel containers on the same host at the exact same time, cloning the exact same repository, and that way you should be able to reproduce the issue.
Expected behavior
The expectation is for Process Isolation to work on par or better than HyperV Isolation.
Configuration:
- Edition: Windows Server 2019
- Base Image being used: jenkins/inbound-agent:3107.v665000b_51092-7-jdk11-windowsservercore-ltsc2019
- Container engine: Docker
- Container Engine version:
Client:
Version: 25.0.0
API version: 1.44
Go version: go1.21.6
Git commit: e758fe5
Built: Thu Jan 18 17:10:49 2024
OS/Arch: windows/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 25.0.0
API version: 1.44 (minimum version 1.24)
Go version: go1.21.6
Git commit: 615dfdf
Built: Thu Jan 18 17:09:34 2024
OS/Arch: windows/amd64
Experimental: false
Additional context
I have verified that there is no resource over provisioning and my Windows defender is disabled, and all my processes (including git and git-lfs) and directories where source code is checked out are part of exclusion list. As mentioned here: #149
Also verified I have the Defender fix, which was released here: #345
Hey @saraf-akshay, could you share what you're seeing with Windows Server 2022 process isolation?
We don't ship OS level fixes anymore for Windows Server 2019 because it is now out of mainstream support (only address security fixes): https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2019
@fady-azmy-msft : Thanks for your response. I'm working on preparing a server with Server 2022. It might take a couple days. I'll keep you posted.
@fady-azmy-msft ,@ntrappe-msft : There is still slowness.
Server 2022 is a lot better than Server 2019. Server 2019 was 2x slower, whereas Server 2022 is 1.25x slower in Process Isolation as compared to HyperV Isolation when I run 10 containers in parallel on a host, (essentially trying to run host at its full capacity) with resource (CPU and Memory) restriction as showed in my first comment's yaml file.
Here is what I have experienced with process isolation compared to Hyper-V isolation. I have seen cascading container failures and even containers that crash and cannot recover EVER they have to be redeployed. The performance is night and day better on my SHIR containers now with Hyper-V isolation.
Host Running 2019 DC
Container 2019 core latest
Azure/Azure-Data-Factory-Integration-Runtime-in-Windows-Container#7
Hello @Howard-Haiyang-Hao @fady-azmy-msft @ntrappe-msft
Just checking in, Any update on this?
This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.
This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.
This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.
Im now running 80+ SHIR containers with hyper-v isolation successfully with little to no issues. Without hyper isolation the max that I could run was about 25+- and that also created issues that cause the container to completely corrupt its self at random. Please make a Linux compatible SHIR application for ADF / Synapse!