Azure/WALinuxAgent

SSL: CERTIFICATE_VERIFY_FAILED

cacheguard opened this issue · 9 comments

Hi

We got this warning:

2023-06-03T07:01:23.197911Z WARNING Daemon Daemon Fetch failed: [HttpError] [HTTP Failed] GET https://md-ssd-kp4mgq3cvbvg.z27.blob.storage.azure.net/$system/myCG.9191fa07-ccc3-4cec-9d77-6601cfc0e58a.vmSettings -- IOError [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997) -- 6 attempts made

Something should be fixed somewhere I think. We suppose that the certificate associated to md-ssd-kp4mgq3cvbvg.z27.blob.storage.azure.net is signed by a private CA certificate. Why? We strictly control our CA root base and this is not something we can allow on our network appliance.

Thank you for your understanding.
Best Regards,

can you provide a log snippet? thanks

I have already provided the log snippet. Please see above... (sippet from /var/log/waagent.log). What other logs do you need? Thanks

@cacheguard could you please provide a longer log segment for before and after this failure? We need some context about what was happening at the time of this failure.

There isn't any entry line before the first warning on the certificate. Find below the whole log:

2023-06-08T14:04:24.575602Z WARNING Daemon Daemon Fetch failed: [HttpError] [HTTP Failed] GET https://md-ssd-3jxkjrr4hm2n.z27.blob.storage.azure.net/$system/myCG.7e6a4b4a-d791-4427-a0cf-4b609d59f486.vmSettings -- IOError [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997) -- 6 attempts made
2023-06-08T14:04:24.596683Z INFO Daemon Daemon Default channel changed to HostGAPlugin channel.
2023-06-08T14:04:24.604417Z INFO Daemon Fetch goal state completed
2023-06-08T14:04:24.616526Z INFO Daemon Daemon RDMA capabilities are not enabled, skipping
2023-06-08T14:04:24.616677Z INFO Daemon Daemon Installed Agent WALinuxAgent-2.9.0.4 is the most current agent
2023-06-08T14:04:24.964952Z INFO ExtHandler ExtHandler Azure Linux Agent (Goal State Agent version 2.9.0.4)
2023-06-08T14:04:24.969663Z INFO ExtHandler ExtHandler OS: cacheguard 2.0.2
2023-06-08T14:04:24.969733Z INFO ExtHandler ExtHandler Python: 3.10.2
2023-06-08T14:04:24.971749Z INFO ExtHandler ExtHandler Distro: cacheguard-2.0.2; OSUtil: CacheGuardOSUtil; AgentService: waagent; Python: 3.10.2; systemd: False; LISDrivers: Failed; logrotate: logrotate 3.19.0;
2023-06-08T14:04:24.971920Z INFO ExtHandler ExtHandler WireServer endpoint 168.63.129.16 read from file
2023-06-08T14:04:24.972122Z INFO ExtHandler ExtHandler Wire server endpoint:168.63.129.16
2023-06-08T14:04:24.987216Z INFO ExtHandler Fetched a new incarnation for the WireServer goal state [incarnation 1]
2023-06-08T14:04:24.997706Z INFO ExtHandler ExtHandler HostGAPlugin version: 1.0.8.140
2023-06-08T14:04:24.998194Z INFO ExtHandler
2023-06-08T14:04:24.998269Z INFO ExtHandler Fetched new vmSettings [HostGAPlugin correlation ID: 034bd747-118d-4750-b261-9335f24aa069 eTag: 8531963493112184260 source: Fabric]
2023-06-08T14:04:24.998573Z INFO ExtHandler The vmSettings originated via Fabric; will ignore them.
2023-06-08T14:04:24.999125Z INFO ExtHandler
2023-06-08T14:04:24.999263Z INFO ExtHandler Fetching full goal state from the WireServer [incarnation 1]
2023-06-08T14:04:32.198555Z INFO Daemon Daemon Agent WALinuxAgent-2.9.0.4 forwarding signal 15 to WALinuxAgent-2.9.0.4

2023-06-08T14:04:53.446440Z INFO Daemon Daemon Azure Linux Agent Version: 2.9.0.4
2023-06-08T14:04:53.447071Z INFO Daemon Daemon OS: cacheguard 2.0.2
2023-06-08T14:04:53.452215Z INFO Daemon Daemon Python: 3.10.2
2023-06-08T14:04:53.452394Z INFO Daemon Daemon Run daemon
2023-06-08T14:04:53.452530Z INFO Daemon Daemon No RDMA handler exists for distro='CacheGuard' version='2.0.2'
2023-06-08T14:04:53.452587Z INFO Daemon Daemon Using waagent for provisioning
2023-06-08T14:04:53.452730Z INFO Daemon Daemon Activate resource disk
2023-06-08T14:04:53.452793Z INFO Daemon Daemon Searching gen1 prefix 00000000-0001 or gen2 f8b3781a-1e82-4818-a1c3-63d806ec15bb
2023-06-08T14:04:53.454741Z INFO Daemon Daemon Found device: sdb
2023-06-08T14:04:53.458147Z INFO Daemon Daemon Examining partition table
2023-06-08T14:04:53.478835Z INFO Daemon Daemon GPT not detected, determining filesystem
2023-06-08T14:04:53.508694Z INFO Daemon Daemon sfdisk --part-type -f /dev/sdb 1 -n succeeded
2023-06-08T14:04:53.508848Z INFO Daemon Daemon The partition type is 83
2023-06-08T14:04:53.510613Z INFO Daemon Daemon Mount resource disk [mount -t ext4 /dev/sdb1 /mnt/azure]
2023-06-08T14:04:53.545176Z INFO Daemon Daemon Resource disk /dev/sdb is mounted at /mnt/azure with ext4
2023-06-08T14:04:53.550778Z INFO Daemon Daemon Clean protocol and wireserver endpoint
2023-06-08T14:04:53.551951Z INFO Daemon Daemon Provisioning already completed, skipping.
2023-06-08T14:04:53.552051Z INFO Daemon Daemon Detect protocol endpoint
2023-06-08T14:04:53.552105Z INFO Daemon Daemon Clean protocol and wireserver endpoint
2023-06-08T14:04:53.552177Z INFO Daemon Daemon WireServer endpoint is not found. Rerun dhcp handler
2023-06-08T14:04:53.552228Z INFO Daemon Daemon Test for route to 168.63.129.16
2023-06-08T14:04:53.552344Z WARNING Daemon Daemon No route exists to 168.63.129.16
2023-06-08T14:04:53.552396Z INFO Daemon Daemon Checking for dhcp lease cache
2023-06-08T14:04:53.552477Z INFO Daemon Daemon Cache exists [False]
2023-06-08T14:04:53.552524Z INFO Daemon Daemon Send dhcp request
2023-06-08T14:04:53.552572Z INFO Daemon Daemon Examine /proc/net/route for primary interface
2023-06-08T14:04:53.552691Z INFO Daemon Daemon Primary interface is [eth0]
2023-06-08T14:04:53.574476Z INFO Daemon Daemon Examine /proc/net/route for primary interface
2023-06-08T14:04:53.574691Z INFO Daemon Daemon Primary interface is [eth0]
2023-06-08T14:04:53.594293Z INFO Daemon Daemon Configure routes
2023-06-08T14:04:53.594391Z INFO Daemon Daemon Gateway:10.0.0.1
2023-06-08T14:04:53.594471Z INFO Daemon Daemon Routes:None
2023-06-08T14:04:53.596046Z INFO Daemon Daemon Wire server endpoint:168.63.129.16
2023-06-08T14:04:53.626926Z INFO Daemon Daemon Fabric preferred wire protocol version:2015-04-05
2023-06-08T14:04:53.627217Z INFO Daemon Daemon Wire protocol version:2012-11-30
2023-06-08T14:04:53.627267Z INFO Daemon Daemon Server preferred version:2015-04-05
2023-06-08T14:04:53.928446Z INFO Daemon Daemon Initializing goal state during protocol detection
2023-06-08T14:04:53.928606Z INFO Daemon Daemon Forcing an update of the goal state.
2023-06-08T14:04:53.933913Z INFO Daemon Fetched a new incarnation for the WireServer goal state [incarnation 1]
2023-06-08T14:04:53.948179Z INFO Daemon Daemon HostGAPlugin version: 1.0.8.140
2023-06-08T14:04:53.948719Z INFO Daemon
2023-06-08T14:04:53.948799Z INFO Daemon Fetched new vmSettings [HostGAPlugin correlation ID: 7d29cd74-e496-45a3-ad06-324f8134242d eTag: 8531963493112184260 source: Fabric]
2023-06-08T14:04:53.949092Z INFO Daemon The vmSettings originated via Fabric; will ignore them.
2023-06-08T14:04:53.950051Z INFO Daemon
2023-06-08T14:04:53.950130Z INFO Daemon Fetching full goal state from the WireServer [incarnation 1]
2023-06-08T14:05:25.304149Z WARNING Daemon Daemon Fetch failed: [HttpError] [HTTP Failed] GET https://md-ssd-3jxkjrr4hm2n.z27.blob.storage.azure.net/$system/myCG.7e6a4b4a-d791-4427-a0cf-4b609d59f486.vmSettings -- IOError [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997) -- 6 attempts made
2023-06-08T14:05:25.323074Z INFO Daemon Daemon Default channel changed to HostGAPlugin channel.
2023-06-08T14:05:25.332035Z INFO Daemon Fetch goal state completed
2023-06-08T14:05:25.343017Z INFO Daemon Daemon RDMA capabilities are not enabled, skipping
2023-06-08T14:05:25.343169Z INFO Daemon Daemon Installed Agent WALinuxAgent-2.9.0.4 is the most current agent
2023-06-08T14:05:25.690127Z INFO ExtHandler ExtHandler Azure Linux Agent (Goal State Agent version 2.9.0.4)
2023-06-08T14:05:25.690269Z INFO ExtHandler ExtHandler OS: cacheguard 2.0.2
2023-06-08T14:05:25.690330Z INFO ExtHandler ExtHandler Python: 3.10.2
2023-06-08T14:05:25.693443Z INFO ExtHandler ExtHandler Distro: cacheguard-2.0.2; OSUtil: CacheGuardOSUtil; AgentService: waagent; Python: 3.10.2; systemd: False; LISDrivers: Failed; logrotate: logrotate 3.19.0;
2023-06-08T14:05:25.693628Z INFO ExtHandler ExtHandler WireServer endpoint 168.63.129.16 read from file
2023-06-08T14:05:25.693689Z INFO ExtHandler ExtHandler Wire server endpoint:168.63.129.16
2023-06-08T14:05:25.708358Z INFO ExtHandler Fetched a new incarnation for the WireServer goal state [incarnation 1]
2023-06-08T14:05:25.717789Z INFO ExtHandler ExtHandler HostGAPlugin version: 1.0.8.140
2023-06-08T14:05:25.718263Z INFO ExtHandler
2023-06-08T14:05:25.718344Z INFO ExtHandler Fetched new vmSettings [HostGAPlugin correlation ID: b6aeb000-d61f-4704-9b2c-8161bb977efd eTag: 8531963493112184260 source: Fabric]
2023-06-08T14:05:25.718656Z INFO ExtHandler The vmSettings originated via Fabric; will ignore them.
2023-06-08T14:05:25.719224Z INFO ExtHandler
2023-06-08T14:05:25.719291Z INFO ExtHandler Fetching full goal state from the WireServer [incarnation 1]
2023-06-08T14:05:56.830472Z WARNING ExtHandler ExtHandler Fetch failed: [HttpError] [HTTP Failed] GET https://md-ssd-3jxkjrr4hm2n.z27.blob.storage.azure.net/$system/myCG.7e6a4b4a-d791-4427-a0cf-4b609d59f486.vmSettings -- IOError [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997) -- 6 attempts made
2023-06-08T14:05:56.863776Z INFO ExtHandler ExtHandler Default channel changed to HostGAPlugin channel.
2023-06-08T14:05:56.871862Z INFO ExtHandler Fetch goal state completed
2023-06-08T14:05:56.884689Z INFO ExtHandler ExtHandler WALinuxAgent-2.9.0.4 running as process 1417
2023-06-08T14:05:56.884829Z INFO ExtHandler ExtHandler ******** Extensions.GoalStatePeriod changed from its default: 6. New value: 0. Changing this value affects how often extensions are processed and status for the VM is reported. Too small a value may report the VM as unresponsive ********
2023-06-08T14:05:56.885188Z INFO ExtHandler ExtHandler ******** Extensions.InitialGoalStatePeriod changed from its default: 6. New value: 0. Changing this value affects how often extensions are processed and status for the VM is reported. Too small a value may report the VM as unresponsive ********
2023-06-08T14:05:56.885522Z INFO ExtHandler ExtHandler ******** OS.EnableFirewall is set to False, not processing the operation ********
2023-06-08T14:05:56.885819Z INFO ExtHandler ExtHandler ******** Extensions.Enabled is set to False, not processing the operation ********
2023-06-08T14:05:56.886099Z INFO ExtHandler ExtHandler ******** AutoUpdate.Enabled is set to False, not processing the operation ********
2023-06-08T14:05:56.887498Z INFO ExtHandler ExtHandler Cgroup monitoring is not supported on ['cacheguard', '2.0.2', 'UTM', 'CacheGuard']
2023-06-08T14:05:56.887858Z INFO ExtHandler ExtHandler Not setting up persistent firewall rules as OS.EnableFirewall=False
2023-06-08T14:05:56.888282Z INFO ExtHandler ExtHandler Checking if log collection is allowed at this time [False]. All three conditions must be met: configuration enabled [True], cgroups enabled [False], python supported: [True]
2023-06-08T14:05:56.889039Z INFO ExtHandler ExtHandler Starting env monitor service.
2023-06-08T14:05:56.889274Z INFO MonitorHandler ExtHandler WireServer endpoint 168.63.129.16 read from file
2023-06-08T14:05:56.889508Z INFO MonitorHandler ExtHandler Wire server endpoint:168.63.129.16
2023-06-08T14:05:56.889702Z INFO MonitorHandler ExtHandler Monitor.NetworkConfigurationChanges is disabled.
2023-06-08T14:05:56.890012Z INFO MonitorHandler ExtHandler Routing table from /proc/net/route:
Iface   Destination     Gateway         Flags   RefCnt  Use     Metric  Mask            MTU     Window  IRTT
eth0    00000000        0100000A        0003    0       0       0       00000000        0       0       0
eth0    0000000A        00000000        0001    0       0       0       00FFFFFF        0       0       0

2023-06-08T14:05:56.890848Z INFO ExtHandler ExtHandler Start SendTelemetryHandler service.
2023-06-08T14:05:56.891060Z INFO EnvHandler ExtHandler WireServer endpoint 168.63.129.16 read from file
2023-06-08T14:05:56.891349Z INFO ExtHandler ExtHandler Start Extension Telemetry service.
2023-06-08T14:05:56.891406Z INFO EnvHandler ExtHandler Wire server endpoint:168.63.129.16
2023-06-08T14:05:56.891684Z INFO EnvHandler ExtHandler Configure routes
2023-06-08T14:05:56.891296Z INFO SendTelemetryHandler ExtHandler Successfully started the SendTelemetryHandler thread
2023-06-08T14:05:56.891952Z INFO EnvHandler ExtHandler Gateway:None
2023-06-08T14:05:56.892068Z INFO EnvHandler ExtHandler Routes:None
2023-06-08T14:05:56.893113Z INFO TelemetryEventsCollector ExtHandler Extension Telemetry pipeline enabled: True
2023-06-08T14:05:56.893229Z INFO TelemetryEventsCollector ExtHandler Successfully started the TelemetryEventsCollector thread
2023-06-08T14:05:56.904688Z INFO ExtHandler ExtHandler Goal State Period: 300 sec. This indicates how often the agent checks for new goal states and reports status.
2023-06-08T14:05:56.912280Z INFO EnvHandler ExtHandler Set block dev timeout: sdb with timeout: 300

An additional question: Is it normal to have all those "INFO ExtHandler..." messages? During the submission for the certification we unticked the option "Supports VM extensions". In addition in /etc/waagent.conf we have "Extensions.Enabled=n".

Thanks

@cacheguard Is there an older waagent.log on the machine?

Also the INFO messages are expected

No, we deleted the waagent.log file and rebooted the VM to have a clean log. And as you can see the log level has been set to INFO. Why are you asking for INFO messages? What else should we activate?
Thanks

Just took a look at your original question again and we don't need additional logs.

I see this failure for all https requests in the logs:
[HttpError] [HTTP Failed] GET https://md-ssd-3jxkjrr4hm2n.z27.blob.storage.azure.net/$system/myCG.7e6a4b4a-d791-4427-a0cf-4b609d59f486.vmSettings -- IOError [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997) -- 6 attempts made

This is openSSL complaining about the SSL certificate when any https request is made. There is some configuration issue on the machine causing this, but no issue on the guest agent side.

Thank you for your post maddieford, Actually I think that there are two problems that we need to fix :

1- By default CacheGuard OS/Appliance uses it's own DNS server so even the name part of the requested URL can't be resolved. Question: does the agent requires that the VM be configured to use the Azure DNS server? It seems that in the context of those Web requests, the used DNS server is the Azure DNS server (and not the default VM's DNS server) because there isn't any reported error about the name resolution.

2- Yes, obviously the issue is not on the agent side. It's because the used Web client can't find the CA certificate used to sign the certificate associated to those Web requests. Using the curl command shows that the certificate is properly signed by a trusted CA certificate. Question : what Web client and/or library is used by the agent to make Web requests? An in what directory (or file) the agent (or the used library) expect to find CA certificates? (there isn't any issue on our OS regarding CA certificates. The OS just hasn't been developed to use the agent).

Best Regards,

We finally resolved our issue. by doing this:

  • First, we identified the openssl directory (OPENSSLDIR) in our OS by using the following command:
openssl  version -a
...
OPENSSLDIR: "/etc/ssl"
...
  • The CA bundle file in our OS is located at /etc/ca-bundle.crt so we created the symbolic link /etc/ssl/cert.pem linked to /etc/ca-bundle.crt). Below the command we simply used:

ln -sf /etc/ca-bundle.crt /etc/ssl/cert.pem

Thanks again