Azure/WALinuxAgent

[BUG] /var/lib/waagent/history taking up disk space

KuSh opened this issue · 14 comments

KuSh commented

/var/lib/waagent/history takes up disk space. It contains 1263464 files taking up 20GB.

# ls -1 /var/lib/waagent/history | wc -l
1263464
# du -sh /var/lib/waagent/history 2>/dev/null
20G	/var/lib/waagent/history

Distro and WALinuxAgent details (please complete the following information):

  • Distro and Version: Ubuntu 22.04
  • WALinuxAgent version
/usr/sbin/waagent:27: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
WALinuxAgent-2.2.46 running on ubuntu 22.04
Python: 3.10.6
Goal state agent: 2.2.46

Log file attached
If possible, please provide the full /var/log/waagent.log file to help us understand the problem better and get the context of the issue.
The log is full of the same lines :

2023-06-05T16:18:26.461436Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:18:26.467519Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:18:26.479238Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:18:26.491881Z INFO ExtHandler Retry detect protocols: retry=258
2023-06-05T16:18:36.506814Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:18:36.513326Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:18:36.518117Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:18:36.522790Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:18:36.530545Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:18:36.536644Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:18:36.541396Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-05T16:18:36.905773Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:18:36.911830Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:18:36.923324Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:18:36.935750Z INFO ExtHandler Retry detect protocols: retry=259
2023-06-05T16:18:46.950708Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:18:46.957058Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:18:46.961926Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:18:46.966761Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:18:46.974642Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:18:46.980745Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:18:46.985510Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-05T16:18:47.391438Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:18:47.397516Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:18:47.409258Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:18:47.421506Z INFO ExtHandler Retry detect protocols: retry=260
2023-06-05T16:18:57.436627Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:18:57.443070Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:18:57.448002Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:18:57.452518Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:18:57.460228Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:18:57.466212Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:18:57.471221Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-05T16:18:57.784699Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:18:57.790682Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:18:57.801999Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:18:57.814302Z INFO ExtHandler Retry detect protocols: retry=261
2023-06-05T16:19:07.827648Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:19:07.834202Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:19:07.839029Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:19:07.843738Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:19:07.851647Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:19:07.857456Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:19:07.862201Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-05T16:19:08.385468Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:19:08.391785Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:19:08.403696Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:19:08.416142Z INFO ExtHandler Retry detect protocols: retry=262
2023-06-05T16:19:18.431431Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:19:18.437793Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:19:18.442681Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:19:18.447416Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:19:18.455172Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:19:18.461181Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:19:18.465981Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-05T16:19:18.931679Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:19:18.937940Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:19:18.949539Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:19:18.961922Z INFO ExtHandler Retry detect protocols: retry=263
2023-06-05T16:19:28.967774Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:19:28.974222Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:19:28.979193Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:19:28.983903Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:19:28.991942Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:19:28.997854Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:19:29.002804Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-05T16:19:29.509588Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:19:29.515638Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:19:29.527222Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:19:29.539680Z INFO ExtHandler Retry detect protocols: retry=264
2023-06-05T16:19:39.551651Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:19:39.558109Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:19:39.563153Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:19:39.567919Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:19:39.576132Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:19:39.582168Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:19:39.587064Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-05T16:19:40.135079Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:19:40.141299Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:19:40.153113Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:19:40.165547Z INFO ExtHandler Retry detect protocols: retry=265
2023-06-05T16:19:50.180569Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:19:50.187195Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:19:50.192353Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:19:50.197232Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:19:50.205120Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:19:50.211259Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:19:50.216139Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-05T16:19:50.671650Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-05T16:19:50.677806Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-05T16:19:50.689646Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-05T16:19:50.702160Z INFO ExtHandler Retry detect protocols: retry=266
2023-06-05T16:20:00.711736Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-05T16:20:00.718395Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-05T16:20:00.723445Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-05T16:20:00.728345Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-05T16:20:00.736203Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-05T16:20:00.742465Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-05T16:20:00.747684Z INFO ExtHandler Server preferred version:2015-04-05

@KuSh 2.2.46 is a very old Agent. Current versions should not have this issue. Is agent autoupdate disabled on your VM?

KuSh commented

@KuSh 2.2.46 is a very old Agent. Current versions should not have this issue. Is agent autoupdate disabled on your VM?

Not that I'm aware of. I'm using standard 22_04-lts-arm64 plan with unattended-upgrades enabled and didn't configure agent

KuSh commented

I did found previous similar issue but since it was marked fixed since 2021 and the VM was created in march this year I didn't thought to check which agent version included the fix

KuSh commented

So it seems to be the package provided by jammy :

$ apt-cache policy walinuxagent
walinuxagent:
  Installed: 2.2.46-0ubuntu5.1
  Candidate: 2.2.46-0ubuntu5.1
  Version table:
 *** 2.2.46-0ubuntu5.1 500
        500 http://ports.ubuntu.com/ubuntu-ports jammy-updates/main arm64 Packages
        100 /var/lib/dpkg/status
     2.2.46-0ubuntu5 500
        500 http://ports.ubuntu.com/ubuntu-ports jammy/main arm64 Packages

And your advice is that "Installation via your distribution's package repository is preferred." So what can we do ?

I'll try to open a ticket on ubuntu side

KuSh commented

I've checked another vm that is identical except that it is amd64 based and the autoupdate did work.
On this arm based vm, it sticks to 2.2.46. restarting nor reinstalling didn't fix the problem. I ended up installing it via git to resolve the issue

AutoUpdate is enabled

$ waagent -show-configuration 2>/dev/null | grep -i autoupdate
AutoUpdate.Enabled = True
AutoUpdate.GAFamily = Prod
Autoupdate.Frequency = 3600

@KuSh What region is this VM in? We're currently rolling out an arm64 agent which has the fix for this. See deployment timeline here: https://github.com/Azure/WALinuxAgent/releases/tag/pre-v2.9.1.1

KuSh commented

@KuSh What region is this VM in? We're currently rolling out an arm64 agent which has the fix for this. See deployment timeline here: https://github.com/Azure/WALinuxAgent/releases/tag/pre-v2.9.1.1

France-central

We published 2.9.1.1 to france central yesterday. Could you please confirm you have autoupdate enabled and your machine takes 2.9.1.1.

Seems like we are seeing the same exact issue, in some VMs in China East 2 region. We're running the same versions of Ubuntu, walinuxagent as above. From my limited testing, the trigger for this was me trying to run a managed run-command using this:

> az vm run-command create --resource-group RESOURCE-GROUP --vm-name VM-NAME --async-execution false --name keklinke-test-cmd --script "whoami"

(VMExtensionProvisioningError) The requested operation requires features that are not supported by the version of the VM agent running in the VM. Unsupported features: 'https://aka.ms/VMExtensionLinuxAgentUpdate'.

More information on troubleshooting is available at MultipleExtensionsPerHandler
Code: VMExtensionProvisioningError
Message: The requested operation requires features that are not supported by the version of the VM agent running in the VM. Unsupported features: 'https://aka.ms/VMExtensionLinuxAgentUpdate'.

More information on troubleshooting is available at MultipleExtensionsPerHandler

It gives an error about using an old agent version, which makes sense since we're running 2.2.46. But after that fails, the agent is no longer responding or ready. /var/log/waagent.log shows this (repeating over and over):

2023-06-08T17:07:14.850357Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:14.936695Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:15.015115Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:15.025121Z WARNING ExtHandler Exception retrieving extension handlers: [ProtocolError] Exceeded max retry updating goal state
2023-06-08T17:07:15.030412Z ERROR ExtHandler Event: name=WALinuxAgent, op=ExtensionProcessing, message=Exception retrieving extension handlers: [ProtocolError] Exceeded max retry updating goal state [<FrameSummary file /usr/lib/python3/dist-packages/azurelinuxagent/ga/exthandlers.py, line 230 in run>, <FrameSummary file /usr/lib/python3/dist-packages/azurelinuxagent/common/protocol/wire.py, line 151 in get_ext_handlers>, <FrameSummary file /usr/lib/python3/dist-packages/azurelinuxagent/common/protocol/wire.py, line 781 in update_goal_state>, <FrameSummary file /usr/lib/python3/dist-packages/azurelinuxagent/common/protocol/wire.py, line 845 in _update_from_goal_state>], duration=0

After that happened, I tried restarting the service by running sudo service walinuxagent restart, and it starts failing with a different error and never really initializes properly:

2023-06-08T17:07:24.880900Z INFO Daemon Agent WALinuxAgent-2.2.46 forwarding signal 15 to WALinuxAgent-2.2.46
2023-06-08T17:07:25.109711Z INFO Daemon Azure Linux Agent Version:2.2.46
2023-06-08T17:07:25.111855Z INFO Daemon OS: ubuntu 22.04
2023-06-08T17:07:25.113736Z INFO Daemon Python: 3.10.6
2023-06-08T17:07:25.114657Z INFO Daemon CGroups Status: The cgroup filesystem is ready to use
2023-06-08T17:07:25.117407Z WARNING Daemon Failed to create a cgroup for the VM Agent; resource usage for the Agent will not be tracked. Error: [CGroupsException] Failed to get paths of agent's cgroups. Error: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'
2023-06-08T17:07:25.118744Z INFO Daemon Run daemon
2023-06-08T17:07:25.125028Z INFO Daemon cloud-init is enabled: True
2023-06-08T17:07:25.127181Z INFO Daemon Using cloud-init for provisioning
2023-06-08T17:07:25.128531Z INFO Daemon Clean protocol and wireserver endpoint
2023-06-08T17:07:25.129925Z INFO Daemon Provisioning already completed, skipping.
2023-06-08T17:07:25.130278Z INFO Daemon RDMA capabilities are not enabled, skipping
2023-06-08T17:07:25.136307Z INFO Daemon Installed Agent WALinuxAgent-2.2.46 is the most current agent
2023-06-08T17:07:25.302584Z INFO ExtHandler Agent WALinuxAgent-2.2.46 is running as the goal state agent
2023-06-08T17:07:25.305250Z INFO ExtHandler Distro info: ubuntu 22.04, osutil class being used: UbuntuOSUtil, agent service name: walinuxagent
2023-06-08T17:07:25.308589Z INFO ExtHandler Detect protocol endpoints
2023-06-08T17:07:25.309158Z INFO ExtHandler Clean protocol and wireserver endpoint
2023-06-08T17:07:25.310759Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-08T17:07:25.311769Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-08T17:07:25.312876Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-08T17:07:25.313779Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-08T17:07:27.332689Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-08T17:07:27.334967Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-08T17:07:27.336013Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-08T17:07:27.712377Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:27.793550Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:27.873378Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:27.882818Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-08T17:07:27.885082Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-08T17:07:27.890309Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-08T17:07:27.892297Z INFO ExtHandler Retry detect protocols: retry=0
2023-06-08T17:07:37.904213Z INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2023-06-08T17:07:37.906977Z INFO ExtHandler Test for route to 168.63.129.16
2023-06-08T17:07:37.909119Z INFO ExtHandler Route to 168.63.129.16 exists
2023-06-08T17:07:37.910138Z INFO ExtHandler Wire server endpoint:168.63.129.16
2023-06-08T17:07:37.929077Z INFO ExtHandler Fabric preferred wire protocol version:2015-04-05
2023-06-08T17:07:37.931572Z INFO ExtHandler Wire protocol version:2012-11-30
2023-06-08T17:07:37.933522Z INFO ExtHandler Server preferred version:2015-04-05
2023-06-08T17:07:38.331606Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:38.408878Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:38.486851Z INFO ExtHandler Found private key matching thumbprint <omitted>
2023-06-08T17:07:38.495784Z INFO ExtHandler WireServer is not responding. Reset dhcp endpoint
2023-06-08T17:07:38.496105Z INFO ExtHandler Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2023-06-08T17:07:38.516567Z INFO ExtHandler Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2023-06-08T17:07:38.519429Z INFO ExtHandler Retry detect protocols: retry=1

I confirmed that autoupdate is enabled, but it doesn't seem like it's working since we're still running the old version:

$ waagent -show-configuration 2>/dev/null | grep -i autoupdate
AutoUpdate.Enabled = True
AutoUpdate.GAFamily = Prod
Autoupdate.Frequency = 3600

And for reference, here is the version info:

$ apt-cache policy walinuxagent
walinuxagent:
  Installed: 2.2.46-0ubuntu5.1
  Candidate: 2.2.46-0ubuntu5.1
  Version table:
 *** 2.2.46-0ubuntu5.1 500
        500 http://azure.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.2.46-0ubuntu5 500
        500 http://azure.archive.ubuntu.com/ubuntu jammy/main amd64 Packages

And from here I'm stuck. Any suggestions? Do I need to reinstall the agent from github?

KuSh commented

We published 2.9.1.1 to france central yesterday. Could you please confirm you have autoupdate enabled and your machine takes 2.9.1.1.

I did install latest release by cloning repo so I can't tell.

Quick update on my issue - I tried installing version 2.9.0.4 manually from GitHub and it seems to have worked. When will version 2.9.1.1 be released to China East 2? I can verify that ours auto-updates after that date.

@keklinke We don't have an exact date for when 2.9.1.1 will be released to China East 2. We're currently working through publishing issues there