timescale/promscale

After system restart, not all containers started

KES777 opened this issue · 6 comments

Describe the bug
I restart system, but not all containers started.

To Reproduce

$ reboot
$ docker compose ps
kes@work ~/o/server/monitoring2 $ docker compose ps
NAME                           COMMAND                  SERVICE             STATUS              PORTS
monitoring2-alertmanager-1     "/bin/alertmanager -…"   alertmanager        exited (255)        0.0.0.0:9093->9093/tcp, :::9093->9093/tcp
monitoring2-db-1               "/docker-entrypoint.…"   db                  running             8008/tcp, 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp, 8081/tcp
monitoring2-grafana-1          "/run.sh"                grafana             running             0.0.0.0:3000->3000/tcp, :::3000->3000/tcp
monitoring2-node_exporter-1    "/bin/node_exporter …"   node_exporter       running             0.0.0.0:9100->9100/tcp, :::9100->9100/tcp
monitoring2-otel-collector-1   "/otelcol --config=/…"   otel-collector      exited (0)          
monitoring2-prometheus-1       "/bin/prometheus --c…"   prometheus          exited (255)        0.0.0.0:9090->9090/tcp, :::9090->9090/tcp
monitoring2-promscale-1        "/promscale"             promscale           running             0.0.0.0:9201-9202->9201-9202/tcp, :::9201-9202->9201-9202/tcp

Expected behavior
After manual stop/start all is fine:

kes@work ~/o/server/monitoring2 $ docker compose stop
[+] Running 7/7
 ⠿ Container monitoring2-alertmanager-1    Stopped                                   0.0s
 ⠿ Container monitoring2-otel-collector-1  Stopped                                   0.0s
 ⠿ Container monitoring2-node_exporter-1   Stopped                                   2.2s
 ⠿ Container monitoring2-grafana-1         Stop...                                   1.6s
 ⠿ Container monitoring2-prometheus-1      S...                                      0.0s
 ⠿ Container monitoring2-promscale-1       St...                                     0.4s
 ⠿ Container monitoring2-db-1              Stopped                                  10.2s
kes@work ~/o/server/monitoring2 $ docker compose start
[+] Running 7/7
 ⠿ Container monitoring2-alertmanager-1    Started                                   3.6s
 ⠿ Container monitoring2-node_exporter-1   Started                                   2.3s
 ⠿ Container monitoring2-db-1              Started                                   5.3s
 ⠿ Container monitoring2-otel-collector-1  Started                                   5.1s
 ⠿ Container monitoring2-promscale-1       St...                                     2.1s
 ⠿ Container monitoring2-prometheus-1      S...                                      1.9s
 ⠿ Container monitoring2-grafana-1         Star...                                   3.1s
kes@work ~/o/server/monitoring2 $ docker compose ps
NAME                           COMMAND                  SERVICE             STATUS              PORTS
monitoring2-alertmanager-1     "/bin/alertmanager -…"   alertmanager        running             0.0.0.0:9093->9093/tcp, :::9093->9093/tcp
monitoring2-db-1               "/docker-entrypoint.…"   db                  running             8008/tcp, 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp, 8081/tcp
monitoring2-grafana-1          "/run.sh"                grafana             running             0.0.0.0:3000->3000/tcp, :::3000->3000/tcp
monitoring2-node_exporter-1    "/bin/node_exporter …"   node_exporter       running             0.0.0.0:9100->9100/tcp, :::9100->9100/tcp
monitoring2-otel-collector-1   "/otelcol --config=/…"   otel-collector      running             4317/tcp, 55678-55679/tcp, 0.0.0.0:14268->14268/tcp, :::14268->14268/tcp
monitoring2-prometheus-1       "/bin/prometheus --c…"   prometheus          running             0.0.0.0:9090->9090/tcp, :::9090->9090/tcp
monitoring2-promscale-1        "/promscale"             promscale           running             0.0.0.0:9201-9202->9201-9202/tcp, :::9201-9202->9201-9202/tcp

Configuration (as applicable)
monitoring2.zip

Version

  • Distribution/OS: Linux Mint 20.3
  • Promscale: latest
  • TimescaleDB: latest

Probably problem belongs to docker service starting. In my case sudo systemctl start docker takes up to 5min.

Logs from exited containers:

$ docker compose logs -f promscale
ts=2022-10-13T05:31:51.705Z caller=main.go:231 level=info msg="Starting Alertmanager" version="(version=0.24.0, branch=HEAD, revision=f484b17fa3c583ed1b2c8bbcec20ba1db2aa5f11)"
ts=2022-10-13T05:31:51.705Z caller=main.go:232 level=info build_context="(go=go1.17.8, user=root@265f14f5c6fc, date=20220325-09:31:33)"
ts=2022-10-13T05:31:51.706Z caller=cluster.go:185 level=info component=cluster msg="setting advertise address explicitly" addr=192.168.129.67 port=9094
ts=2022-10-13T05:31:51.708Z caller=cluster.go:680 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
ts=2022-10-13T05:31:51.738Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/alertmanager.yml
ts=2022-10-13T05:31:51.738Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/alertmanager.yml
ts=2022-10-13T05:31:51.741Z caller=main.go:535 level=info msg=Listening address=:9093
ts=2022-10-13T05:31:51.741Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
ts=2022-10-13T05:31:53.708Z caller=cluster.go:705 level=info component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000127686s
ts=2022-10-13T05:32:01.709Z caller=cluster.go:697 level=info component=cluster msg="gossip settled; proceeding" elapsed=10.001595026s
ts=2022-10-13T05:32:43.074Z caller=notify.go:732 level=warn component=dispatcher receiver=web.hook integration=webhook[0] msg="Notify attempt failed, will retry later" attempts=1 err="Post \"http://127.0.0.1:5001/\": dial tcp 127.0.0.1:5001: connect: connection refused"
ts=2022-10-13T05:37:43.074Z caller=dispatch.go:354 level=error component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="web.hook/webhook[0]: notify retry canceled after 17 attempts: Post \"http://127.0.0.1:5001/\": dial tcp 127.0.0.1:5001: connect: connection refused"
ts=2022-10-13T05:37:43.074Z caller=notify.go:732 level=warn component=dispatcher receiver=web.hook integration=webhook[0] msg="Notify attempt failed, will retry later" attempts=1 err="Post \"http://127.0.0.1:5001/\": dial tcp 127.0.0.1:5001: connect: connection refused"
ts=2022-10-13T05:42:43.074Z caller=dispatch.go:354 level=error component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="web.hook/webhook[0]: notify retry canceled after 18 attempts: Post \"http://127.0.0.1:5001/\": dial tcp 127.0.0.1:5001: connect: connection refused"
ts=2022-10-13T05:42:43.075Z caller=notify.go:732 level=warn component=dispatcher receiver=web.hook integration=webhook[0] msg="Notify attempt failed, will retry later" attempts=1 err="Post \"http://127.0.0.1:5001/\": dial tcp 127.0.0.1:5001: connect: connection refused"
$ docker logs monitoring2-otel-collector-1
2022-10-13T05:31:54.521Z	info	service/telemetry.go:102	Setting up own telemetry...
2022-10-13T05:31:54.521Z	info	service/telemetry.go:137	Serving Prometheus metrics	{"address": ":8888", "level": "basic"}
2022-10-13T05:31:54.521Z	debug	components/components.go:28	Stable component.{"kind": "exporter", "data_type": "traces", "name": "otlp", "stability": "stable"}
2022-10-13T05:31:54.521Z	info	components/components.go:30	In development component. May change in the future.	{"kind": "exporter", "data_type": "traces", "name": "logging", "stability": "in development"}
2022-10-13T05:31:54.521Z	debug	components/components.go:28	Stable component.{"kind": "processor", "name": "batch", "pipeline": "traces", "stability": "stable"}
2022-10-13T05:31:54.521Z	debug	components/components.go:28	Stable component.{"kind": "receiver", "name": "otlp", "pipeline": "traces", "stability": "stable"}
2022-10-13T05:31:54.543Z	info	extensions/extensions.go:42	Starting extensions...
2022-10-13T05:31:54.543Z	info	pipelines/pipelines.go:74	Starting exporters...
2022-10-13T05:31:54.543Z	info	pipelines/pipelines.go:78	Exporter is starting...	{"kind": "exporter", "data_type": "traces", "name": "otlp"}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Channel created	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] original dial target is: "promscale:9202"	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] parsed dial target is: {Scheme:promscale Authority: Endpoint:9202 URL:{Scheme:promscale Opaque:9202 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] fallback to scheme "passthrough"	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] parsed dial target is: {Scheme:passthrough Authority: Endpoint:promscale:9202 URL:{Scheme:passthrough Opaque: User: Host: Path:/promscale:9202 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Channel authority set to "promscale:9202"	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Resolver state updated: {
  "Addresses": [
    {
      "Addr": "promscale:9202",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Type": 0,
      "Metadata": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses)	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Channel switches to new LB policy "pick_first"	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1 SubChannel #2] Subchannel created	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING	{"grpc_log": true}
2022-10-13T05:31:54.543Z	info	pipelines/pipelines.go:82	Exporter started.{"kind": "exporter", "data_type": "traces", "name": "otlp"}
2022-10-13T05:31:54.543Z	info	pipelines/pipelines.go:78	Exporter is starting...	{"kind": "exporter", "data_type": "traces", "name": "logging"}
2022-10-13T05:31:54.543Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1 SubChannel #2] Subchannel picks a new address "promscale:9202" to connect	{"grpc_log": true}
2022-10-13T05:31:54.544Z	info	pipelines/pipelines.go:82	Exporter started.{"kind": "exporter", "data_type": "traces", "name": "logging"}
2022-10-13T05:31:54.544Z	info	pipelines/pipelines.go:86	Starting processors...
2022-10-13T05:31:54.544Z	info	pipelines/pipelines.go:90	Processor is starting...	{"kind": "processor", "name": "batch", "pipeline": "traces"}
2022-10-13T05:31:54.544Z	info	pipelines/pipelines.go:94	Processor started.{"kind": "processor", "name": "batch", "pipeline": "traces"}
2022-10-13T05:31:54.544Z	info	pipelines/pipelines.go:98	Starting receivers...
2022-10-13T05:31:54.544Z	info	pipelines/pipelines.go:102	Receiver is starting...	{"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2022-10-13T05:31:54.544Z	info	zapgrpc/zapgrpc.go:174	[core] [Server #3] Server created	{"grpc_log": true}
2022-10-13T05:31:54.544Z	info	otlpreceiver/otlp.go:70	Starting GRPC server on endpoint 0.0.0.0:4317	{"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2022-10-13T05:31:54.544Z	info	zapgrpc/zapgrpc.go:174	[core] pickfirstBalancer: UpdateSubConnState: 0xc0003cb0e0, {CONNECTING <nil>}	{"grpc_log": true}
2022-10-13T05:31:54.544Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Channel Connectivity change to CONNECTING	{"grpc_log": true}
2022-10-13T05:31:54.544Z	info	otlpreceiver/otlp.go:88	Starting HTTP server on endpoint 0.0.0.0:4318	{"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2022-10-13T05:31:54.544Z	info	pipelines/pipelines.go:106	Receiver started.{"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2022-10-13T05:31:54.544Z	info	service/collector.go:215	Starting otelcol...	{"Version": "0.56.0", "NumCPU": 8}
2022-10-13T05:31:54.544Z	info	service/collector.go:128	Everything is ready. Begin running and processing data.
2022-10-13T05:31:54.544Z	info	zapgrpc/zapgrpc.go:174	[core] [Server #3 ListenSocket #4] ListenSocket created	{"grpc_log": true}
2022-10-13T05:31:55.522Z	warn	zapgrpc/zapgrpc.go:191	[core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "promscale:9202",
  "ServerName": "promscale:9202",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 192.168.129.70:9202: connect: connection refused"	{"grpc_log": true}
2022-10-13T05:31:55.522Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1 SubChannel #2] Subchannel Connectivity change to TRANSIENT_FAILURE	{"grpc_log": true}
2022-10-13T05:31:55.522Z	info	zapgrpc/zapgrpc.go:174	[core] pickfirstBalancer: UpdateSubConnState: 0xc0003cb0e0, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dialing dial tcp 192.168.129.70:9202: connect: connection refused"}	{"grpc_log": true}
2022-10-13T05:31:55.522Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Channel Connectivity change to TRANSIENT_FAILURE	{"grpc_log": true}
2022-10-13T05:31:56.522Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1 SubChannel #2] Subchannel Connectivity change to IDLE	{"grpc_log": true}
2022-10-13T05:31:56.522Z	info	zapgrpc/zapgrpc.go:174	[core] pickfirstBalancer: UpdateSubConnState: 0xc0003cb0e0, {IDLE connection error: desc = "transport: Error while dialing dial tcp 192.168.129.70:9202: connect: connection refused"}	{"grpc_log": true}
2022-10-13T05:31:56.522Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Channel Connectivity change to IDLE	{"grpc_log": true}
2022-10-13T05:32:10.568Z	info	TracesExporter	{"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 18}
2022-10-13T05:32:10.568Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING	{"grpc_log": true}
2022-10-13T05:32:10.568Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1 SubChannel #2] Subchannel picks a new address "promscale:9202" to connect	{"grpc_log": true}
2022-10-13T05:32:10.569Z	info	zapgrpc/zapgrpc.go:174	[core] pickfirstBalancer: UpdateSubConnState: 0xc0003cb0e0, {CONNECTING <nil>}	{"grpc_log": true}
2022-10-13T05:32:10.569Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Channel Connectivity change to CONNECTING	{"grpc_log": true}
2022-10-13T05:32:10.570Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY	{"grpc_log": true}
2022-10-13T05:32:10.570Z	info	zapgrpc/zapgrpc.go:174	[core] pickfirstBalancer: UpdateSubConnState: 0xc0003cb0e0, {READY <nil>}	{"grpc_log": true}
2022-10-13T05:32:10.570Z	info	zapgrpc/zapgrpc.go:174	[core] [Channel #1] Channel Connectivity change to READY	{"grpc_log": true}
2022-10-13T05:32:15.576Z	info	TracesExporter	{"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 1}
2022-10-13T05:32:20.584Z	info	TracesExporter	{"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 1}
2022-10-13T05:32:25.592Z	info	TracesExporter	{"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 142}
2022-10-13T05:32:30.601Z	info	TracesExporter	{"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 1}
2022-10-13T05:32:35.610Z	info	TracesExporter	{"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 1}
2022-10-13T05:32:40.617Z	info	TracesExporter	{"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 4}
$ docker logs monitoring2-prometheus-1
ts=2022-10-13T05:31:59.034Z caller=main.go:499 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2022-10-13T05:31:59.034Z caller=main.go:543 level=info msg="Starting Prometheus Server" mode=server version="(version=2.39.1, branch=HEAD, revision=dcd6af9e0d56165c6f5c64ebbc1fae798d24933a)"
ts=2022-10-13T05:31:59.034Z caller=main.go:548 level=info build_context="(go=go1.19.2, user=root@273d60c69592, date=20221007-15:57:09)"
ts=2022-10-13T05:31:59.035Z caller=main.go:549 level=info host_details="(Linux 5.18.10-051810-generic #202207091532-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 9 15:55:09 UTC  x86_64 452af5bed03c (none))"
ts=2022-10-13T05:31:59.035Z caller=main.go:550 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2022-10-13T05:31:59.035Z caller=main.go:551 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2022-10-13T05:31:59.078Z caller=web.go:559 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2022-10-13T05:31:59.079Z caller=main.go:980 level=info msg="Starting TSDB ..."
ts=2022-10-13T05:31:59.080Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
ts=2022-10-13T05:31:59.127Z caller=head.go:551 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2022-10-13T05:31:59.161Z caller=head.go:595 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=34.231158ms
ts=2022-10-13T05:31:59.162Z caller=head.go:601 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2022-10-13T05:32:00.684Z caller=head.go:672 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=1
ts=2022-10-13T05:32:00.685Z caller=head.go:672 level=info component=tsdb msg="WAL segment loaded" segment=1 maxSegment=1
ts=2022-10-13T05:32:00.685Z caller=head.go:709 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=51.418µs wal_replay_duration=1.523174312s wbl_replay_duration=284ns total_replay_duration=1.55749006s
ts=2022-10-13T05:32:00.688Z caller=main.go:1001 level=info fs_type=9123683e
ts=2022-10-13T05:32:00.688Z caller=main.go:1004 level=info msg="TSDB started"
ts=2022-10-13T05:32:00.688Z caller=main.go:1184 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2022-10-13T05:32:00.688Z caller=dedupe.go:112 component=remote level=info remote_name=036793 url=http://promscale:9201/write msg="Starting WAL watcher" queue=036793
ts=2022-10-13T05:32:00.688Z caller=dedupe.go:112 component=remote level=info remote_name=036793 url=http://promscale:9201/write msg="Starting scraped metadata watcher"
ts=2022-10-13T05:32:00.689Z caller=dedupe.go:112 component=remote level=info remote_name=036793 url=http://promscale:9201/write msg="Replaying WAL" queue=036793
ts=2022-10-13T05:32:00.689Z caller=main.go:1221 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=1.166861ms db_storage=1.524µs remote_storage=599.125µs web_handler=590ns query_engine=1.35µs scrape=222.539µs scrape_sd=53.897µs notify=3.439µs notify_sd=1.996µs rules=1.591µs tracing=3.255µs
ts=2022-10-13T05:32:00.689Z caller=main.go:965 level=info msg="Server is ready to receive web requests."
ts=2022-10-13T05:32:00.689Z caller=manager.go:943 level=info component="rule manager" msg="Starting rule manager..."
ts=2022-10-13T05:32:08.364Z caller=dedupe.go:112 component=remote level=info remote_name=036793 url=http://promscale:9201/write msg="Done replaying WAL" duration=7.675696513s
ts=2022-10-13T05:32:11.241Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1665587073423 maxt=1665590400000 ulid=01GF7X87C9VQ6AH2N9DVRKVFK2 duration=2.784734429s
ts=2022-10-13T05:32:11.245Z caller=head.go:1192 level=info component=tsdb msg="Head GC completed" caller=truncateMemory duration=2.336689ms
ts=2022-10-13T05:32:13.936Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1665590401457 maxt=1665597600000 ulid=01GF7X8A3D8YM89TJJPCPKKF7M duration=2.691490605s
ts=2022-10-13T05:32:13.940Z caller=head.go:1192 level=info component=tsdb msg="Head GC completed" caller=truncateMemory duration=3.269452ms
ts=2022-10-13T05:32:30.690Z caller=dedupe.go:112 component=remote level=info remote_name=036793 url=http://promscale:9201/write msg="Remote storage resharding" from=1 to=9
ts=2022-10-13T05:32:40.689Z caller=dedupe.go:112 component=remote level=warn remote_name=036793 url=http://promscale:9201/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1665639143 minSendTimestamp=1665639150
ts=2022-10-13T05:32:50.689Z caller=dedupe.go:112 component=remote level=info remote_name=036793 url=http://promscale:9201/write msg="Currently resharding, skipping."
ts=2022-10-13T05:33:20.689Z caller=dedupe.go:112 component=remote level=warn remote_name=036793 url=http://promscale:9201/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1665639182 minSendTimestamp=1665639190
ts=2022-10-13T05:33:40.689Z caller=dedupe.go:112 component=remote level=info remote_name=036793 url=http://promscale:9201/write msg="Remote storage resharding" from=9 to=1

@KES777 Is this still an issue? Looks like the containers are unable to reach the dependent ones. Maybe some network issue from the docker daemon front.

Just tried. Yes, this is still an issue.

image

This is one stack. So I do not think this should be network issue.

But after server boot, if I restart stack once more,
image

then everything works fine:
image

Digging a bit found, that I required to add restart: unless-stopped. Then all services are started.

May you please adjust docker-compose.yml?

Unless-stopped restarts the container only when any user executes a command to stop the container, not when it fails because of an error. This isn't the right restart policy to add. I will include restart: on-failure. This will make sure to restart if the container crashes.

Unless-stopped works as an anti-pattern when the user wants to stop the container it restarts again.

@KES777 Is this still an issue? Closing the issue for now. Feel free to re-open the issue if you are still facing the issue with docker-compose.