aws/amazon-ssm-agent

amazon-agent-worker, amazon-document-worker logs are missing in journal

arianvp opened this issue · 1 comments

Hey all

Currently the default seelog setup logs to both /var/log/amazon/ssm/amazon-ssm-agent.log and stdout.

However I am missing all worker-related logs in the systemd journal (Which by default captures stdout).

The problem is that the worker processes seem to be forked off and their stdout is not redirected or captured to the main amazon-ssm-agent process. This means that all worker-related logs are missing in journalctl -u amazon-ssm-agent.service.

The worker logs do end up in /var/log/amazon/ssm/amazon-ssm-agent.log as both the agent and the workers seem to use the same seelog config. However this onto itself seems problematic to me. Having multiple processes write to the same log file is inherently racey and could corrupt the log file potentially as multiple instances of seelog are writing to and rotating the log file..

Is there a way to get all ssm-related logs in the journal?

Also is it actually safe that all the worker processes are currently writing to the same log file concurrently? It sounds problematic to me

Some suggestions:

  • Don't spawn worker processes yourself, but create amazon-agent-worker.service systemd unit files
  • Capture the stdout/stderr of worker processes and forward them to amazon-ssm-agent stderr/stdout?

Journal log (Note all worker-related things are missing):

$ sudo journalctl -o cat -u amazon-ssm-agent.service
Started Agent to enable remote management of your Amazon EC2 instance configuration.
Initializing new seelog logger
New Seelog Logger Creation Complete
2024/06/10 09:10:28 Found config file at /etc/amazon/ssm/amazon-ssm-agent.json.
Applying config override from /etc/amazon/ssm/amazon-ssm-agent.json.
2024/06/10 09:10:28 processing appconfig overrides
2024/06/10 09:10:28 Found config file at /etc/amazon/ssm/amazon-ssm-agent.json.
Applying config override from /etc/amazon/ssm/amazon-ssm-agent.json.
2024/06/10 09:10:28 processing appconfig overrides
2024-06-10 09:10:28 INFO Proxy environment variables:
2024/06/10 09:10:28 Found config file at /etc/amazon/ssm/amazon-ssm-agent.json.
Applying config override from /etc/amazon/ssm/amazon-ssm-agent.json.
2024/06/10 09:10:28 processing appconfig overrides
2024/06/10 09:10:28 Found config file at /etc/amazon/ssm/amazon-ssm-agent.json.
Applying config override from /etc/amazon/ssm/amazon-ssm-agent.json.
2024/06/10 09:10:28 processing appconfig overrides
2024-06-10 09:10:28 INFO https_proxy:
2024-06-10 09:10:28 INFO http_proxy:
2024-06-10 09:10:28 INFO no_proxy:
2024-06-10 09:10:28 INFO Checking if agent identity type OnPrem can be assumed
2024-06-10 09:10:28 INFO Checking if agent identity type EC2 can be assumed
2024-06-10 09:10:28 INFO Agent will take identity from EC2
2024-06-10 09:10:28 INFO [amazon-ssm-agent] using named pipe channel for IPC
2024-06-10 09:10:28 INFO [amazon-ssm-agent] using named pipe channel for IPC
2024-06-10 09:10:28 INFO [amazon-ssm-agent] using named pipe channel for IPC
2024-06-10 09:10:28 INFO [amazon-ssm-agent] amazon-ssm-agent - v3.3.131.0
2024-06-10 09:10:28 INFO [amazon-ssm-agent] OS: linux, Arch: arm64
2024-06-10 09:10:28 INFO [amazon-ssm-agent] Starting Core Agent
2024-06-10 09:10:28 INFO [amazon-ssm-agent] Registrar detected. Attempting registration
2024-06-10 09:10:28 INFO [Registrar] Starting registrar module
2024-06-10 09:10:28 INFO [EC2Identity] Checking disk for registration info
2024-06-10 09:10:28 INFO [EC2Identity] No registration info found for ec2 instance, attempting registration
2024-06-10 09:10:28 INFO [EC2Identity] Generating registration keypair
2024-06-10 09:10:28 INFO [EC2Identity] Checking write access before registering
2024-06-10 09:10:28 INFO [EC2Identity] Registering EC2 instance with Systems Manager
2024-06-10 09:10:28 INFO [EC2Identity] EC2 registration was successful.
2024-06-10 09:10:28 INFO [amazon-ssm-agent] Registration attempted. Resuming core agent startup.
2024-06-10 09:10:28 INFO [CredentialRefresher] credentialRefresher has started
2024-06-10 09:10:28 INFO [CredentialRefresher] Starting credentials refresher loop
2024-06-10 09:10:28 INFO EC2RoleProvider Successfully connected with instance profile role credentials
2024-06-10 09:10:28 INFO [CredentialRefresher] Credentials ready
2024-06-10 09:10:28 INFO [CredentialRefresher] Next credential rotation will be in 29.99999340746667 minutes
2024-06-10 09:10:29 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] [WorkerProvider] Worker ssm-agent-worker is not running, starting worker process
2024-06-10 09:10:30 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] [WorkerProvider] Worker ssm-agent-worker (pid:983) started
2024-06-10 09:10:30 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] Monitor long running worker health every 60 seconds

vs /var/log/amazon/ssm/amazon-ssm-agent.log

2024-06-10 09:10:28 INFO Proxy environment variables:
2024-06-10 09:10:28 INFO https_proxy:
2024-06-10 09:10:28 INFO http_proxy:
2024-06-10 09:10:28 INFO no_proxy:
2024-06-10 09:10:28 INFO Checking if agent identity type OnPrem can be assumed
2024-06-10 09:10:28 INFO Checking if agent identity type EC2 can be assumed
2024-06-10 09:10:28 INFO Agent will take identity from EC2
2024-06-10 09:10:28 INFO [amazon-ssm-agent] using named pipe channel for IPC
2024-06-10 09:10:28 INFO [amazon-ssm-agent] using named pipe channel for IPC
2024-06-10 09:10:28 INFO [amazon-ssm-agent] using named pipe channel for IPC
2024-06-10 09:10:28 INFO [amazon-ssm-agent] amazon-ssm-agent - v3.3.131.0
2024-06-10 09:10:28 INFO [amazon-ssm-agent] OS: linux, Arch: arm64
2024-06-10 09:10:28 INFO [amazon-ssm-agent] Starting Core Agent
2024-06-10 09:10:28 INFO [amazon-ssm-agent] Registrar detected. Attempting registration
2024-06-10 09:10:28 INFO [Registrar] Starting registrar module
2024-06-10 09:10:28 INFO [EC2Identity] Checking disk for registration info
2024-06-10 09:10:28 INFO [EC2Identity] No registration info found for ec2 instance, attempting registration
2024-06-10 09:10:28 INFO [EC2Identity] Generating registration keypair
2024-06-10 09:10:28 INFO [EC2Identity] Checking write access before registering
2024-06-10 09:10:28 INFO [EC2Identity] Registering EC2 instance with Systems Manager
2024-06-10 09:10:28 INFO [EC2Identity] EC2 registration was successful.
2024-06-10 09:10:28 INFO [amazon-ssm-agent] Registration attempted. Resuming core agent startup.
2024-06-10 09:10:28 INFO [CredentialRefresher] credentialRefresher has started
2024-06-10 09:10:28 INFO [CredentialRefresher] Starting credentials refresher loop
2024-06-10 09:10:28 INFO EC2RoleProvider Successfully connected with instance profile role credentials
2024-06-10 09:10:28 INFO [CredentialRefresher] Credentials ready
2024-06-10 09:10:28 INFO [CredentialRefresher] Next credential rotation will be in 29.99999340746667 minutes
2024-06-10 09:10:29 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] [WorkerProvider] Worker ssm-agent-worker is not running, starting worker process
2024-06-10 09:10:30 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] [WorkerProvider] Worker ssm-agent-worker (pid:983) started
2024-06-10 09:10:30 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] Monitor long running worker health every 60 seconds
2024-06-10 09:10:31 INFO [ssm-agent-worker] Checking if agent identity type OnPrem can be assumed
2024-06-10 09:10:31 INFO [ssm-agent-worker] Checking if agent identity type EC2 can be assumed
2024-06-10 09:10:31 INFO [ssm-agent-worker] Agent will take identity from EC2
2024-06-10 09:10:31 INFO [ssm-agent-worker] using named pipe channel for IPC
2024-06-10 09:10:31 INFO [ssm-agent-worker] Dial to Core Agent broadcast channel
2024-06-10 09:10:31 INFO [ssm-agent-worker] using named pipe channel for IPC
2024-06-10 09:10:31 INFO [ssm-agent-worker] Create new startup processor
2024-06-10 09:10:31 INFO [ssm-agent-worker] Dial to Core Agent broadcast channel
2024-06-10 09:10:31 INFO [ssm-agent-worker] using named pipe channel for IPC
2024-06-10 09:10:31 INFO [ssm-agent-worker] Start to listen to Core Agent health channel
2024-06-10 09:10:31 INFO [ssm-agent-worker] Start to listen to Core Agent termination channel
2024-06-10 09:10:31 INFO [ssm-agent-worker] [StartupProcessor] Executing startup processor tasks
2024-06-10 09:10:31 INFO [ssm-agent-worker] [StartupProcessor] Write to serial port: Amazon SSM Agent v3.3.131.0 is running
2024-06-10 09:10:31 INFO [ssm-agent-worker] [StartupProcessor] Write to serial port: OsProductName: NixOS
2024-06-10 09:10:31 INFO [ssm-agent-worker] [StartupProcessor] Write to serial port: OsVersion: 24.05
2024-06-10 09:10:31 INFO [ssm-agent-worker] [MessageService] Appending MGSInteractor to MessageService interactors
2024-06-10 09:10:31 INFO [ssm-agent-worker] [MessageService] Appending MDSInteractor to MessageService interactors
2024-06-10 09:10:31 INFO [ssm-agent-worker] [LongRunningPluginsManager] registered plugins: {}
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing bookkeeping folders
2024-06-10 09:10:31 INFO [ssm-agent-worker] removing the completed state files
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing bookkeeping folders for long running plugins
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing replies folder for MDS reply requests that couldn't reach the service
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing replies folder for MGS reply requests that couldn't reach the service
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing healthcheck folders for long running plugins
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing locations for inventory plugin
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing default location for custom inventory
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing default location for file inventory
2024-06-10 09:10:31 INFO [ssm-agent-worker] Initializing default location for role inventory

In general having logs show up in the journal would simplify system administration a lot as i have everything in one place.