microsoft/Windows-Containers

Fatal SSL errors occur in some applications unless curl is run inside the container, whose side effects somehow resolve the issue

doctorpangloss opened this issue · 8 comments

Describe the bug
There is a surprising interaction between the way certificates are configured on Windows Containers; the differences between how Docker Desktop / dockerd.exe and Kubernetes mount (or do not mount) host certificates onto containers; and various workarounds that Windows applications do to work around certificate issues.

The net result is that a fatal SSL occur will occur always in a production Kubernetes cluster, unless some other side-effect-causing command resolves the issue.

To Reproduce
In a mcr.microsoft.com/windows/servercore:10.0.20348.1970 image:

FROM mcr.microsoft.com/windows/servercore:10.0.20348.1970 as build
USER ContainerAdministrator

ARG PIP_DISABLE_PIP_VERSION_CHECK=1
ARG PIP_NO_CACHE_DIR=1

RUN powershell -NoProfile -ExecutionPolicy Bypass -Command "[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))"
RUN choco install -y python --version=3.11.9
RUN pip install fsspec

FROM mcr.microsoft.com/windows/servercore:10.0.20348.1970 as final
USER ContainerAdministrator
COPY --link --from=build C:/Python311 C:/Python311
SHELL ["cmd", "/S", "/C"]
RUN setx /M PATH "%PATH%;C:\Python311;C:\Python311\Scripts"

Python has now been copied into the image with no side effects.

import fsspec

url = 'https://fastly.picsum.photos/id/269/536/354.jpg?hmac=LprBr5lQFyGBvbqcZYCZ6hoF4eJ_hfCEplsS2XSFOhY'
file_name = 'image.jpg'

with fsspec.open(url, mode='rb') as f:
    with open(file_name, 'wb') as file:
        file.write(f.read())

will cause the error

raise ClientConnectorCertificateError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host fastly.picsum.photos:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)')]

If I run

curl -I https://upload.wikimedia.org/wikipedia/commons/c/c9/Rainbow-diagram-ROYGBIV.svg

i.e., curl an unrelated URL, but not https://www.google.com, the error goes away.

This also reproduced when the reloadium python package was installed during a Docker build, which is probably related to pip_system_certs causing this issue to occur earlier. It wasn't essential that Python was installed by copying in this case, it occurred in the build image.

Expected behavior
Fatal SSL errors should not occur unexpectedly.

Configuration:

  • Edition: Windows 2022 LTSC
  • Base Image being used: Windows Server Core
  • Container engine:

Docker Desktop test:

Client:
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Fri Feb 23 02:40:51 2024
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.28.0 (139021)
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435
  Built:            Tue Feb  6 20:55:49 2024
  OS/Arch:          windows/amd64
  Experimental:     false

Kubernetes test:

clientVersion:
  buildDate: "2023-06-14T09:53:42Z"
  compiler: gc
  gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
  gitTreeState: clean
  gitVersion: v1.27.3
  goVersion: go1.20.5
  major: "1"
  minor: "27"
  platform: windows/amd64
kustomizeVersion: v5.0.1
serverVersion:
  buildDate: "2024-04-09T05:12:22Z"
  compiler: gc
  gitCommit: 1649f592f1909b97aa3c2a0a8f968a3fd05a7b8b
  gitTreeState: clean
  gitVersion: v1.26.15+k0s
  goVersion: go1.21.9
  major: "1"
  minor: "26"
  platform: linux/amd64

Test node:

System Info:
  Machine ID:                                AppMana-xxx
  System UUID:                               7x-xxx-xxx-xxx-x5
  Boot ID:                                   58
  Kernel Version:                            10.0.20348.1547
  OS Image:                                  Windows Server 2022 Datacenter
  Operating System:                          windows
  Architecture:                              amd64
  Container Runtime Version:                 containerd://1.7.17
  Kubelet Version:                           v1.26.2
  Kube-Proxy Version:                        v1.26.2

Further discussion:

(1) Windows 2022 certificate stores do not resemble the ones that are shipped with web browsers.
(2) There's no simple way to install those certificates into the system certificate store, like apt install -y ca-certificates.
(3) Some Python packages, like requests, ship with their own certificates. Others, like aiohttp, read them from the system. When curl -I ... is called at the system level, the certificate gets populated in "the right place" on the system, and aiohttp can later find it; but this method does not work for all URLs, certificates are apparently only loaded eagerly.

A good resolution to this issue would be some way to install the root certificates easily into Windows from the web bundles. This appears to be a persistent problem for application developers, for maybe decades, but perhaps for the sake of application developers seeking a similar experience to Linux, Microsoft should provide a ca-certificates-like solution once and for all.

This is likely a root cause of a significant number of vague issues installing things in Windows containers.

We'll get some engineers to take a look at this.

🔖 ADO 51748773 (Internal)

I'm not sure which certificate you're using in your scenario, but I did the following test with certifi (which packages the CA certificates from the requests library) from python to get certificates to use and it looks like SSL will resolve.

You may opt for a different set of certificates to use in your scenario.

Step 1. Set up your image working directory

  1. Create a working directory to build your image

  2. Add your dockerfile to your working directory

a. Here's a modified version from the one you shared:

FROM mcr.microsoft.com/windows/servercore:10.0.20348.1970 as build
USER ContainerAdministrator

ARG PIP_DISABLE_PIP_VERSION_CHECK=1
ARG PIP_NO_CACHE_DIR=1

RUN powershell -NoProfile -ExecutionPolicy Bypass -Command "[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))"
RUN choco install -y python --version=3.11.9
# RUN pip install fsspec
COPY requirements.txt requirements.txt
RUN python -m pip install -r C:\requirements.txt

FROM mcr.microsoft.com/windows/servercore:10.0.20348.1970 as final
USER ContainerAdministrator
COPY --link --from=build C:/Python311 C:/Python311
COPY test_fsspec.py test_fsspec.py
SHELL ["cmd", "/S", "/C"]
RUN setx /M PATH "%PATH%;C:\Python311;C:\Python311\Scripts"
  1. Add a requirements.txt file to your working directory with the following contents:
aiohttp==3.9.5
aiosignal==1.3.1
attrs==23.2.0
certifi==2024.6.2
frozenlist==1.4.1
fsspec==2024.6.0
idna==3.7
multidict==6.0.5
yarl==1.9.4
  1. Add a test_fsspec.py to your working directory with the following contents:
import fsspec
import os
import ssl
import certifi

# set the openssl ca file environment variable to your certificate path
# If you have a different certificate to use, you can specify the path here
os.environ[ssl.get_default_verify_paths().openssl_cafile_env] = certifi.where()

url = 'https://fastly.picsum.photos/id/269/536/354.jpg?hmac=LprBr5lQFyGBvbqcZYCZ6hoF4eJ_hfCEplsS2XSFOhY'
file_name = 'image.jpg'

with fsspec.open(url, mode='rb') as f:
    with open(file_name, 'wb') as file:
        file.write(f.read())

You can update os.environ[ssl.get_default_verify_paths().openssl_cafile_env] to point to a different certificate path. In this scenario, we are pointing to the path provided by certifi.where().

Step 2. Build your image

  1. In your working directory, you can build your image:
# in your image working directory
docker build -f Dockerfile -t test-ssl:ltsc2022 .

Step 3. Run your image

  1. You can run your built container image
docker run -it test-ssl:ltsc2022 cmd
  1. Within your running container image, you can also call python test_fsspec.py to check that you can use SSL
python test_fsspec.py
  1. You should be able to confirm that your file was downloaded by running the following command in your container
dir image.jpg

I'm not sure which certificate you're using in your scenario, but I did the following test with certifi (which packages the CA certificates from the requests library) from python to get certificates to use and it looks like SSL will resolve.

Thanks for investigating a workaround.

Is there an equivalent of ca-certificates for Windows? There ought to be.

I'm not sure which certificate you're using in your scenario, but I did the following test with certifi (which packages the CA certificates from the requests library) from python to get certificates to use and it looks like SSL will resolve.

Thanks for investigating a workaround.

Is there an equivalent of ca-certificates for Windows? There ought to be.

I've updated the notes to use trusted certificates from Windows Update, see the example below:

Step 1. Set up your image working directory

  1. Create a working directory to build your image

  2. Add your dockerfile to your working directory

a. Here's a modified version from the one you shared:

FROM mcr.microsoft.com/windows/servercore:10.0.20348.1970 as build
USER ContainerAdministrator

ARG PIP_DISABLE_PIP_VERSION_CHECK=1
ARG PIP_NO_CACHE_DIR=1

RUN powershell -NoProfile -ExecutionPolicy Bypass -Command "[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))"
RUN choco install -y python --version=3.11.9
# RUN pip install fsspec
COPY requirements.txt requirements.txt
RUN python -m pip install -r C:\requirements.txt

FROM mcr.microsoft.com/windows/servercore:10.0.20348.1970 as final
USER ContainerAdministrator
COPY --link --from=build C:/Python311 C:/Python311
COPY update_cert.ps1 update_cert.ps1
COPY test_fsspec.py test_fsspec.py
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
RUN .\update_cert.ps1
SHELL ["cmd", "/S", "/C"]
RUN setx /M PATH "%PATH%;C:\Python311;C:\Python311\Scripts"
  1. Add an update_cert.ps1 file to your working directory with the following contents:
$ROOTSTORE_SST_PATH = "C:\rootstore.sst"
$CERT_STORE_LOCATION = "cert:\LocalMachine\Root"

# Get rootstore from WU
# https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/dn265983(v=ws.11)#to-create-a-subset-of-trusted-certificates
certutil -generateSSTFromWU $ROOTSTORE_SST_PATH

# Import the certificates
$(Get-ChildItem -Path $ROOTSTORE_SST_PATH) | Import-Certificate -CertStoreLocation $CERT_STORE_LOCATION

# clean up rootstore sst
Remove-Item $ROOTSTORE_SST_PATH -Force

This will get a subset of trusted certificates from Windows Update

  1. Add a requirements.txt file to your working directory with the following contents:
aiohttp==3.9.5
aiosignal==1.3.1
attrs==23.2.0
frozenlist==1.4.1
fsspec==2024.6.0
idna==3.7
multidict==6.0.5
yarl==1.9.4
  1. Add a test_fsspec.py to your working directory with the following contents:
import fsspec

url = 'https://fastly.picsum.photos/id/269/536/354.jpg?hmac=LprBr5lQFyGBvbqcZYCZ6hoF4eJ_hfCEplsS2XSFOhY'
file_name = 'image.jpg'

with fsspec.open(url, mode='rb') as f:
    with open(file_name, 'wb') as file:
        file.write(f.read())

Step 2. Build your image

  1. In your working directory, you can build your image:
# in your image working directory
docker build -f Dockerfile -t test-ssl:ltsc2022 .

Step 3. Run your image

  1. You can run your built container image
docker run -it test-ssl:ltsc2022 cmd
  1. Within your running container image, you can also call python test_fsspec.py to check that you can use SSL
python test_fsspec.py
  1. You should be able to confirm that your file was downloaded by running the following command in your container
dir image.jpg

related to #506

Can we keep this issue separate? Seems like a feature request as mentioned by @ntrappe-msft

update_cert.ps1 should be standard in all applications containers like .net 7.0, python, etc. this looks good to me, thanks for the investigation