Commifreak/unraid-appdata.backup

Is it possible to honour the docker autostart delay?

Opened this issue · 14 comments

Updating docker containers will stop/update/start the container immediately and then proceed to stop/update/start the next, however on older or low power hardware this can end up with the containers updating but failing to start.

The Unraid docker implementation has an autostart delay built in, used to slow down starting of containers after a reboot for just this issue, is it possible to honour this delay before starting the next update procedure, so containers can update without manual intervention on this sort of setup please?

on older or low power hardware this can end up with the containers updating but failing to start.

How? These actions happens in order. I dont know why this should fail - and with what error message?
The container delay is being respected at container start - but AFTER it started.

If there are race conditions, it must be an issue with docker itself. Please provide a error message (syslog?)

I could add a static 3 second wait after internal docker update. Should hurt nobody.

Actually, it looks like I made a huge assumption over why the containers were not starting and it looks like the docker autostart delays are indeed honoured by appdata.backup. My sincere apologies.

I believe what is happening is a container which is being used as the networking for other containers is being updated and it looks like as the dependant containers are recreated due to the networking change, appdata.backup can't find the containers to start them, log snippet below.

[24.06.2024 03:02:55][ℹ️][network-container] Should NOT backup external volumes, sanitizing them...
[24.06.2024 03:02:55][ℹ️][network-container] Calculated volumes to back up: /mnt/user/appdata/network-container
[24.06.2024 03:02:55][ℹ️][network-container] Backing up network-container...
[24.06.2024 03:02:55][ℹ️][network-container] Backup created without issues
[24.06.2024 03:02:55][ℹ️][network-container] Verifying backup...
[24.06.2024 03:02:55][ℹ️][network-container] Installing planned update for network-container...
[24.06.2024 03:03:04][ℹ️][Main] Set containers to previous state
[24.06.2024 03:03:04][ℹ️][network-container] Starting network-container... (try #1) done!
[24.06.2024 03:03:08][ℹ️][container1] Starting container1... (try #1) Container did not started! - Code: No such container
[24.06.2024 03:03:13][ℹ️][container1] Starting container1... (try #2) Container did not started! - Code: No such container
[24.06.2024 03:03:18][ℹ️][container1] Starting container1... (try #3) Container did not started! - Code: No such container
[24.06.2024 03:03:18][❌][container1] Container did not started after multiple tries, skipping.
[24.06.2024 03:03:22][ℹ️][container1] Starting container1... (try #1) Container did not started! - Code: No such container
[24.06.2024 03:03:27][ℹ️][container1] Starting container1... (try #2) Container did not started! - Code: No such container
[24.06.2024 03:03:32][ℹ️][container1] Starting container1... (try #3) Container did not started! - Code: No such container
[24.06.2024 03:03:32][❌][container1] Container did not started after multiple tries, skipping.
[24.06.2024 03:03:35][ℹ️][container1] The container has a delay set, waiting 30 seconds before carrying on
[24.06.2024 03:04:05][ℹ️][container1] Starting container1... (try #1) Container did not started! - Code: No such container
[24.06.2024 03:04:10][ℹ️][container1] Starting container1... (try #2) Container did not started! - Code: No such container
[24.06.2024 03:04:15][ℹ️][container1] Starting container1... (try #3) Container did not started! - Code: No such container
[24.06.2024 03:04:15][❌][container1] Container did not started after multiple tries, skipping.
[24.06.2024 03:04:17][ℹ️][container1] The container has a delay set, waiting 90 seconds before carrying on
[24.06.2024 03:05:47][ℹ️][container1] Starting container1... (try #1) Container did not started! - Code: No such container
[24.06.2024 03:05:52][ℹ️][container1] Starting container1... (try #2) Container did not started! - Code: No such container
[24.06.2024 03:05:57][ℹ️][container1] Starting container1... (try #3) Container did not started! - Code: No such container
[24.06.2024 03:05:57][❌][container1] Container did not started after multiple tries, skipping.
[24.06.2024 03:06:00][ℹ️][container1] The container has a delay set, waiting 60 seconds before carrying on

When checking back in the unraid GUI in the morning, all containers are present, just not started

Again, sorry for the assumptions!

Thats weird. But the names do not change, doesnt it?

nope, the names are the same, rebuild only takes 30 -60 seconds or so for all of the dependant containers.

The container ID does change though for all of the dependant containers

The container ID is not being used for those actions. Please submit a debug log and share its ID. There are some internal debugs in that case.

Hi,

Sorry for the delay, the most recent debug log didn't have the issue, but it reoccurred last night, so I've submitted the log and the ID is

24d6ffd8-670d-4af5-a111-ae1b129dccfb

I need to adjust my debug things inside the plugin to get a deeper look. Stay tuned.

I'm not sure if I have a similar problem or not. In my case everything was working fine for a few weeks (new user) and last night I received 27 emails telling me there were "back up issues" as the some docker containers couldn't be restarted as "they didn't exist".

Event: Appdata Backup
Subject: [AppdataBackup] Error!
Description: Please check the backup log!
Importance: alert

Container did not started after multiple tries, skipping.

Six containers were impacted and all of these were all configured such that they had no network - these were all slaved off GlueTun rather than being directly (or indirectly) connected to the network. All the other containers started with no issue.

The impacted containers started perfectly normally when manually started.

Logs uploaded - 6c11d71e-29a4-461d-9b1d-2d875f7d6f2e.

Just in case it helps.

The impacted containers started perfectly normally when manually started.

Thats the fun part: The start mechanism is the same as Unraid does it. Not sure why it did not work during backup :/

Having the same error:

Debug Log: 8edca3b9-9b4e-4833-9c8b-a291814ce2cc

Same thing about this happening only with containers routed through another container for its network. Seems like it might just need a delay for the network to build across the containers and then start? The error might be that the network container (in my case, qbittorrent with VPN) doesn't exist yet (it is rebuilding the network) as opposed to the containers that can't start not existing (e.g., the ones that have "--net=container:qbittorrent")

Maybe. But the script does not wait after qbittorrent_music. Is there a delay set at all?

But Noch such container is saying, the container is not there. A startup error with a network not existing, would cause another message as far as I know.

Having the same error:

Debug Log: 8edca3b9-9b4e-4833-9c8b-a291814ce2cc

Same thing about this happening only with containers routed through another container for its network. Seems like it might just need a delay for the network to build across the containers and then start? The error might be that the network container (in my case, qbittorrent with VPN) doesn't exist yet (it is rebuilding the network) as opposed to the containers that can't start not existing (e.g., the ones that have "--net=container:qbittorrent")

I'm not sure - all the containers linked to GlueTun immediately flag as "waiting for rebuild" when GlueTun is updated (so they all know they have to rebuild) and my starts are staggered with 10s delays so if this theory holds then maybe the first 1 or 2 containers would fail to start but the rest will work fine as they've had time to do their thing, but none of my containers will restart. Conversely, if it is the case that one is rebuilding to start and the process is blocking subsequent containers from starting then at least 1 container should be started with the other either all skipped, or if it was honoring the delay between failed start attempts, some started and some not. The actual rebuild process I find is very quick as there is no image to pull - in just seconds all of mine are rebuilt and ready to go....

same problem
debug id 868ba415-051b-42e2-b406-3f626a882054