Azure/WALinuxAgent

[BUG] the agent go offline randomly. ERROR ExtHandler ExtHandler Error fetching the goal state in waagent.log

Closed this issue · 1 comments

Describe the bug: A clear and concise description of what the bug is.
I use azure VMSS as agent pool in azure devops. After migrating from ubuntu 18.04 to ubuntu 22.04 (and installing all required softwares), sometimes, an agent (VMSS instance) go offline while runing a job. The pipeline is stopped ([error] We stopped hearing from agent ). The agent is shown as unhealthy in the my agent pool and then deleted. I keep an unhealthy instance to investigate. I found some errors in /var/log/waagent.log

Note: Please add some context which would help us understand the problem better

  1. Section of the log where the error occurs.
    2023-10-27T08:36:03.202629Z ERROR ExtHandler ExtHandler Error fetching the goal state: [ProtocolError] Error fetching goal state: [ResourceGoneError] [HTTP Failed] [410: Gone] b'\n\n ResourceNotAvailable\n The resource requested is no longer available. Please refresh your cache.\n
    \n'
    Traceback (most recent call last):
    File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/protocol/wire.py", line 788, in update_goal_state
    self._goal_state.update(silent=silent)
    File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/protocol/goal_state.py", line 209, in update
    self._update(force_update=False)
    File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/protocol/goal_state.py", line 224, in _update
    incarnation, xml_text, xml_doc = GoalState._fetch_goal_state(self._wire_client)
    File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/protocol/goal_state.py", line 352, in _fetch_goal_state
    xml_text = wire_client.fetch_config(uri, wire_client.get_header())
    File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/protocol/wire.py", line 571, in fetch_config
    resp = self.call_wireserver(restutil.http_get, uri, headers=headers)
    File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/protocol/wire.py", line 546, in call_wireserver
    resp = http_req(*args, **kwargs)
    File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/utils/restutil.py", line 528, in http_get
    return http_request("GET",
    File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/utils/restutil.py", line 478, in http_request
    raise ResourceGoneError(response_error)
    azurelinuxagent.common.exception.ResourceGoneError: [ResourceGoneError] [HTTP Failed] [410: Gone] b'\n\n ResourceNotAvailable\n The resource requested is no longer available. Please refresh your cache.\n
    \n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/ga/update.py", line 492, in _try_update_goal_state
protocol.client.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
File "/var/lib/waagent/WALinuxAgent-2.9.1.1/bin/WALinuxAgent-2.9.1.1-py3.8.egg/azurelinuxagent/common/protocol/wire.py", line 793, in update_goal_state
raise ProtocolError("Error fetching goal state: {0}".format(ustr(exception)))
azurelinuxagent.common.exception.ProtocolError: [ProtocolError] Error fetching goal state: [ResourceGoneError] [HTTP Failed] [410: Gone] b'\n\n ResourceNotAvailable\n The resource requested is no longer available. Please refresh your cache.\n

\n'

  1. Serial console output
  2. Steps to reproduce the behavior.

Distro and WALinuxAgent details (please complete the following information):

  • Distro and Version: [e.g. Ubuntu 16.04]
    PRETTY_NAME="Ubuntu 22.04.3 LTS"
    NAME="Ubuntu"
    VERSION_ID="22.04"
    VERSION="22.04.3 LTS (Jammy Jellyfish)"
    VERSION_CODENAME=jammy
  • WALinuxAgent version [e.g. 2.2.40, you can copy the output of waagent --version, more info here ]
    waagent --version
    /usr/sbin/waagent:27: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
    import imp
    WALinuxAgent-2.2.46 running on ubuntu 22.04
    Python: 3.10.12
    Goal state agent: 2.9.1.1

Additional context
Add any other context about the problem here.
I seams to apprear after an npm job, or docker job with npm instruction, but i'm not sure.

Log file attached
If possible, please provide the full /var/log/waagent.log file to help us understand the problem better and get the context of the issue.
waagent_agent_30.log

@mahmoudghorbelMG
Those errors tend to be transitory, and in this case the error went away a few seconds later:

2023-10-27T08:36:49.318851Z INFO ExtHandler ExtHandler Fetching the goal state recovered from previous errors. Fetched etag_13848932411473871259

In this case, the error did not affect the functionality of the WALinuxAgent at all, since at that point the agent was idle. When the error went away, it fetched an operation (etag_13848932411473871259 ) that had already been processed before, so it went back to idle.

The operation had been processed at

2023-10-27T07:57:10.702114Z INFO ExtHandler ExtHandler ProcessExtensionsGoalState completed [etag_13848932411473871259 4082 ms]

So, this error is unrelated to the failures in your vmss instance.