netdata/helmchart

Similar issue to #122 - Deployment Issue/System only stays up for minutes.

DylanDKnight opened this issue · 21 comments

I seem to be having a similar issue to #122; the node will show up in netdata cloud for around a minute and allow me to view the logs of the netdata before dropping off.

I am on agent v1.25.0 and running on GCP Kubernetes, v1.16.13-gke.1

netdatalogs

Then it will switch to UNREACHABLE:

unreachable

Then when it comes back up, it shows the gap in metrics, which each time has near enough the exact same amount of up-time, between drops.

drops

I have attached the parent logs here: https://gist.github.com/DylanDKnight/7c9d73459eb64211d4298a65d14d11a2

I have attached the child logs here: https://gist.github.com/DylanDKnight/a6a6d45541ca729ccf2df79cc622e2f9

This is the Helm install command I am using; it gets me up time for a few minutes.

helm install \
  --set parent.resources.limits.cpu=1 \
  --set parent.resources.requests.cpu=1 \
  --set parent.resources.limits.memory=1Gi \
  --set parent.resources.requests.memory=1Gi \
  --set child.resources.limits.cpu=1 \
  --set child.resources.requests.cpu=1 \
  --set child.resources.limits.memory=1Gi \
  --set child.resources.requests.memory=1Gi \
  --set parent.database.persistence=true \
  --set parent.alarms.persistence=true \
  --set parent.claiming.enabled=true \
  --set service.port=19998 \
  --set parent.claiming.token="TOKEN" \
  --set parent.claiming.rooms="ROOM" \
  netdata ./netdata-helmchart/charts/netdata

I seem to have to use
./netdata-helmchart/charts/netdata,
if I use
helm install netdata ./netdata-helmchart
I get
Error: validation: chart.metadata is required

Any help would be appreciated, let me know if you need me to grab anything else.

Hey @DylanDKnight,

Welcome to our community! I am so sorry that you are experiencing this issue, but we will get to the bottom of this! 🙇‍♂️

Thank you for providing so detailed bug details, it will greatly speed up the triaging :)

cc @cakrit because you seem to have insight on issue #122 , cc @prologic because ✌️😅

This line from the logs https://gist.github.com/DylanDKnight/7c9d73459eb64211d4298a65d14d11a2#file-gistfile1-txt-L333 looks to be related to whatever the root cause is.

@underhood @netdata/agent can you help with this? What could cause the entry above in the logs?

This line from the logs https://gist.github.com/DylanDKnight/7c9d73459eb64211d4298a65d14d11a2#file-gistfile1-txt-L333 looks to be related to whatever the root cause is.

@underhood @netdata/agent can you help with this? What could cause the entry above in the logs?

The entry is normal. There is no negotiation at this point so the default (fallback) version is 2. The agent shuts down after receiving a signal at

2020-10-04 03:26:51: netdata INFO : MAIN : SIGNAL: Received SIGTERM. Cleaning up to exit...

@DylanDKnight it could be the case that kubernetes kills the parent's pod because the liveness/readiness probes do not succeed after 90 seconds.

The default liveness/readiness probe thresholds are 90 seconds as seen here: https://github.com/netdata/helmchart/blob/master/charts/netdata/values.yaml#L76

The timestamps in your logs almost match, seeing that the pod initializes around 2020-10-04 04:25:26.192 BST https://gist.github.com/DylanDKnight/7c9d73459eb64211d4298a65d14d11a2#file-gistfile1-txt-L2 and get a SIGTERM around 2020-10-04 04:26:51.167 BST https://gist.github.com/DylanDKnight/7c9d73459eb64211d4298a65d14d11a2#file-gistfile1-txt-L334, that is, about 85 seconds later.

Could you try increasing these values with ?

--set parent.livenessProbe.failureThreshold=5
--set parent.readinessProbe.failureThreshold=5

Thanks

@knatsakis

I increased the liveness/readiness probe.

The pods do stay up longer now, but still suffer the same issue.

Parent Logs: https://gist.github.com/DylanDKnight/02e1f3e9317306daf56f8a701f69682a

netdatacloudscreenshot

I have dug into the pod, as I am able to now (It would drop before I could query events)

Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Warning  FailedScheduling        18m (x3 over 18m)   default-scheduler        pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
  Normal   Scheduled               18m                 default-scheduler        Successfully assigned default/netdata-parent-6f64dd8f64-jjbtd to gke-binance-futures--btc-usdt-market--d0f476b6-0xps
  Normal   SuccessfulAttachVolume  18m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-120e24aa-3aed-43a8-82a6-7959eb7eea7b"
  Normal   SuccessfulAttachVolume  18m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-ec78679b-ffc7-486a-8c06-9e1f8b6a021b"
  Normal   Killing                 15m                 kubelet                  Container netdata failed liveness probe, will be restarted
  Normal   Pulling                 15m (x2 over 18m)   kubelet                  Pulling image "netdata/netdata:v1.25.0"
  Normal   Created                 15m (x2 over 18m)   kubelet                  Created container netdata
  Normal   Started                 15m (x2 over 18m)   kubelet                  Started container netdata
  Warning  Unhealthy               14m (x7 over 17m)   kubelet                  Readiness probe failed: Get http://10.16.3.27:19998/api/v1/info: dial tcp 10.16.3.27:19998: connect: connection refused
  Warning  Unhealthy               13m (x9 over 17m)   kubelet                  Liveness probe failed: Get http://10.16.3.27:19998/api/v1/info: dial tcp 10.16.3.27:19998: connect: connection refused
  Normal   Pulled                  3m2s (x7 over 18m)  kubelet                  Successfully pulled image "netdata/netdata:v1.25.0"

I have managed to get it to stay up.

I changed --set service.port=19998 back to --set service.port=19999 and the liveness/readiness probe is now succeeding.

now it appears to have the same issue as before:

Error
2020-10-05 14:46:50.492 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:50: netdata ERROR : ACLK_Query_0 : ACLK version negotiation failed. No reply to "hello" with "version" from cloud in time of 3s. Reverting to default ACLK version of 2.
Error
2020-10-05 14:46:59.847 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : WEB_SERVER[static1] : clients wants to STREAM metrics.
Error
2020-10-05 14:46:59.847 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : thread created with task id 210
Error
2020-10-05 14:46:59.847 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : set name of thread 210 to STREAM_RECEIVER
Error
2020-10-05 14:46:59.847 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : STREAM gke-binance-futures-mark-default-pool-162627ff-p8bd [10.146.15.228]:39684: receive thread created (task id 210)
Error
2020-10-05 14:46:59.849 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : WEB_SERVER[static6] : clients wants to STREAM metrics.
Error
2020-10-05 14:46:59.859 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata ERROR : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : HEALTH [gke-binance-futures-mark-default-pool-162627ff-p8bd]: cannot open health file: /var/lib/netdata/04759223-9d4c-46db-abd5-395d1f1ebe04/health/health-log.db.old (errno 2, No such file or directory)
Error
2020-10-05 14:46:59.865 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : WEB_SERVER[static4] : clients wants to STREAM metrics.
Error
2020-10-05 14:46:59.868 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : Host 'gke-binance-futures-mark-default-pool-162627ff-p8bd' (at registry as 'gke-binance-futures-mark-default-pool-162627ff-p8bd') with guid '04759223-9d4c-46db-abd5-395d1f1ebe04' initialized, os 'linux', timezone 'UTC', tags '', program_name 'netdata', program_version 'v1.25.0', update every 1, memory mode save, history entries 3996, streaming disabled (to '' with api key ''), health enabled, cache_dir '/var/cache/netdata/04759223-9d4c-46db-abd5-395d1f1ebe04', varlib_dir '/var/lib/netdata/04759223-9d4c-46db-abd5-395d1f1ebe04', health_log '/var/lib/netdata/04759223-9d4c-46db-abd5-395d1f1ebe04/health/health-log.db', alarms default handler '/usr/libexec/netdata/plugins.d/alarm-notify.sh', alarms default recipient 'root'
Error
2020-10-05 14:46:59.868 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : STREAM gke-binance-futures-mark-default-pool-162627ff-p8bd [receive from [10.146.15.228]:39684]: initializing communication...
Error
2020-10-05 14:46:59.868 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : STREAM gke-binance-futures-mark-default-pool-162627ff-p8bd [receive from [10.146.15.228]:39684]: Netdata is using the stream version 3.
Error
2020-10-05 14:46:59.868 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : Postponing health checks for 60 seconds, on host 'gke-binance-futures-mark-default-pool-162627ff-p8bd', because it was just connected.
Error
2020-10-05 14:46:59.868 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-p8bd,[10.146.15.228]:39684] : STREAM gke-binance-futures-mark-default-pool-162627ff-p8bd [receive from [10.146.15.228]:39684]: receiving metrics...
Error
2020-10-05 14:46:59.868 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : thread created with task id 211
Error
2020-10-05 14:46:59.868 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : set name of thread 211 to STREAM_RECEIVER
Error
2020-10-05 14:46:59.868 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : STREAM gke-binance-futures--btc-usdt-market--d0f476b6-0xps [10.16.3.1]:55206: receive thread created (task id 211)
Error
2020-10-05 14:46:59.869 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-82gw,[10.146.15.234]:59215] : thread created with task id 212
Error
2020-10-05 14:46:59.869 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-82gw,[10.146.15.234]:59215] : set name of thread 212 to STREAM_RECEIVER
Error
2020-10-05 14:46:59.869 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures-mark-default-pool-162627ff-82gw,[10.146.15.234]:59215] : STREAM gke-binance-futures-mark-default-pool-162627ff-82gw [10.146.15.234]:59215: receive thread created (task id 212)
Error
2020-10-05 14:46:59.885 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata ERROR : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : HEALTH [gke-binance-futures--btc-usdt-market--d0f476b6-0xps]: cannot open health file: /var/lib/netdata/19655a23-1800-4959-8b97-f9ffe13b214a/health/health-log.db.old (errno 2, No such file or directory)
Error
2020-10-05 14:46:59.889 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : Host 'gke-binance-futures--btc-usdt-market--d0f476b6-0xps' (at registry as 'gke-binance-futures--btc-usdt-market--d0f476b6-0xps') with guid '19655a23-1800-4959-8b97-f9ffe13b214a' initialized, os 'linux', timezone 'UTC', tags '', program_name 'netdata', program_version 'v1.25.0', update every 1, memory mode save, history entries 3996, streaming disabled (to '' with api key ''), health enabled, cache_dir '/var/cache/netdata/19655a23-1800-4959-8b97-f9ffe13b214a', varlib_dir '/var/lib/netdata/19655a23-1800-4959-8b97-f9ffe13b214a', health_log '/var/lib/netdata/19655a23-1800-4959-8b97-f9ffe13b214a/health/health-log.db', alarms default handler '/usr/libexec/netdata/plugins.d/alarm-notify.sh', alarms default recipient 'root'
Error
2020-10-05 14:46:59.889 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : STREAM gke-binance-futures--btc-usdt-market--d0f476b6-0xps [receive from [10.16.3.1]:55206]: initializing communication...
Error
2020-10-05 14:46:59.889 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : STREAM gke-binance-futures--btc-usdt-market--d0f476b6-0xps [receive from [10.16.3.1]:55206]: Netdata is using the stream version 3.
Error
2020-10-05 14:46:59.889 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : Postponing health checks for 60 seconds, on host 'gke-binance-futures--btc-usdt-market--d0f476b6-0xps', because it was just connected.
Error
2020-10-05 14:46:59.889 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : STREAM_RECEIVER[gke-binance-futures--btc-usdt-market--d0f476b6-0xps,[10.16.3.1]:55206] : STREAM gke-binance-futures--btc-usdt-market--d0f476b6-0xps [receive from [10.16.3.1]:55206]: receiving metrics...
Error
2020-10-05 14:46:59.895 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata INFO : WEB_SERVER[static5] : clients wants to STREAM metrics.
Error
2020-10-05 14:46:59.906 BST
netdatanetdata-parent-686bdf57f9-jcvz92020-10-05 13:46:59: netdata LOG FLOOD PROTECTION too many logs (201 logs in 18 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process 'netdata' for 1182 seconds.

So digging in a bit further to the logs.

The first log on startup is:
netdata ERROR : MAIN : Ignoring host prefix '/host': path '/host' failed to stat() (errno 2, No such file or directory)

The last log before a crash is:
netdata LOG FLOOD PROTECTION too many logs (201 logs in 30 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process 'netdata' for 1170 seconds.

Another thing to add to the issue.

When I only had MongoDB, Redis & PGBouncer deployed, Netdata ran without crashing, as soon as we deployed some of our in-house applications, it crashed.

@DylanDKnight I was able to reproduce the issue by setting service.port to 19998. I am trying to find a fix..

@knatsakis Nice,

I switched the port back to 19999, and that fixed that issue for me.

Although, I still see this in the logs:
2020-10-07 22:04:08: netdata ERROR : MAIN : LISTENER: Invalid listen port 0 given. Defaulting to 19999. (errno 22, Invalid argument)

The current issue I am having is:

2020-10-07 22:04:19: netdata LOG FLOOD PROTECTION too many logs (201 logs in 10 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process 'netdata' for 1190 seconds.

That is the last message before it crashes.

Also when the system is up, I only see Netdata metrics within Netdata cloud, no CPU or even memory stats and no ability to add anything else other than Netdata stats to a dashboard.

I am also seeing these errors on the child

Error
2020-10-07 23:04:31.365 BST
2020-10-07 22:04:31: netdata LOG FLOOD PROTECTION too many logs (201 logs in 50 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process 'netdata' for 1150 seconds.
Error
2020-10-07 23:04:31.365 BST
2020-10-07 22:04:31: netdata ERROR : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: restart stream because socket reports errors (POLLERR) - 313263 bytes transmitted.
Error
2020-10-07 23:04:31.365 BST
2020-10-07 22:04:31: netdata ERROR : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: failed to send metrics - closing connection - we have sent 313263 bytes on this connection. (errno 9, Bad file descriptor)
Error
2020-10-07 23:04:31.365 BST
2020-10-07 22:04:31: netdata ERROR : PLUGIN[proc] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send]: not ready - discarding collected metrics.
Error
2020-10-07 23:04:31.365 BST
2020-10-07 22:04:31: netdata ERROR : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: error during read (-1). Restarting connection (errno 104, Connection reset by peer)
Error
2020-10-07 23:04:30.388 BST
2020-10-07 22:04:30: netdata INFO : PLUGINSD[apps] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send]: sending metrics...
Error
2020-10-07 23:04:30.372 BST
2020-10-07 22:04:30: netdata INFO : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: established communication with a parent using protocol version 3 - ready to send metrics...
Error
2020-10-07 23:04:30.372 BST
2020-10-07 22:04:30: netdata INFO : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: waiting response from remote netdata...
Error
2020-10-07 23:04:30.372 BST
2020-10-07 22:04:30: netdata INFO : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: initializing communication...
Error
2020-10-07 23:04:30.371 BST
2020-10-07 22:04:30: netdata INFO : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: connecting...
Error
2020-10-07 23:04:30.371 BST
2020-10-07 22:04:30: netdata ERROR : PLUGIN[cgroups] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send]: not ready - discarding collected metrics. (errno 22, Invalid argument)
Error
2020-10-07 23:04:30.371 BST
2020-10-07 22:04:30: netdata ERROR : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: error during read (-1). Restarting connection (errno 104, Connection reset by peer)
Error
2020-10-07 23:04:30.371 BST
2020-10-07 22:04:30: netdata INFO : PLUGIN[cgroups] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send]: sending metrics...
Error
2020-10-07 23:04:30.368 BST
2020-10-07 22:04:30: netdata INFO : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: established communication with a parent using protocol version 3 - ready to send metrics...
Error
2020-10-07 23:04:30.368 BST
2020-10-07 22:04:30: netdata INFO : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: waiting response from remote netdata...
Error
2020-10-07 23:04:30.368 BST
2020-10-07 22:04:30: netdata INFO : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: initializing communication...
Error
2020-10-07 23:04:30.367 BST
2020-10-07 22:04:30: netdata INFO : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: connecting...
Error
2020-10-07 23:04:30.367 BST
2020-10-07 22:04:30: netdata ERROR : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: restart stream because socket reports errors (POLLERR) - 386120 bytes transmitted.
Error
2020-10-07 23:04:30.367 BST
2020-10-07 22:04:30: netdata ERROR : STREAM_SENDER[gke-binance-futures-m-binance-futures-3f39fac9-6678] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send to netdata:19999]: failed to send metrics - closing connection - we have sent 386120 bytes on this connection. (errno 9, Bad file descriptor)
Error
2020-10-07 23:04:30.367 BST
2020-10-07 22:04:30: netdata ERROR : PLUGIN[proc] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [send]: not ready - discarding collected metrics. (errno 22, Invalid argument)

So I set this to 2000000

errors to trigger flood protection = 200

and via the logs, it looks like it is now trying to collect metrics/charts.

Error
2020-10-08 00:59:55.517 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu43_softirqs/HRTIMER.db.
Error
2020-10-08 00:59:55.517 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu43_softirqs/SCHED.db.
Error
2020-10-08 00:59:55.517 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu43_softirqs/TASKLET.db.
Error
2020-10-08 00:59:55.517 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu43_softirqs/NET_RX.db.
Error
2020-10-08 00:59:55.517 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu43_softirqs/NET_TX.db.
Error
2020-10-08 00:59:55.516 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu43_softirqs/TIMER.db.
Error
2020-10-08 00:59:55.516 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu43_softirqs/main.db.
Error
2020-10-08 00:59:55.516 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu42_softirqs/RCU.db.
Error
2020-10-08 00:59:55.516 BST
2020-10-07 23:59:55: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:37984] : Initializing file /var/cache/netdata/190dd876-0f7c-4008-8f06-cfef4aab4e69/cpu.cpu42_softirqs/SCHED.db.

but there is a lot of these kind of logs:

2020-10-08 00:59:56.355 BST
2020-10-07 23:59:56: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:2497] : RRDSET: chart name 'netdata.aclk_write_q' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-08 00:59:56.355 BST
2020-10-07 23:59:56: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:2497] : RRDSET: chart name 'netdata.aclk_query_per_second' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-08 00:59:56.355 BST
2020-10-07 23:59:56: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:2497] : RRDSET: chart name 'netdata.aclk_status' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.

and it crashes with:

Error
2020-10-08 00:59:56.397 BST
Shutting down spawn server loop complete.
Error
2020-10-08 00:59:56.397 BST
Shutting down spawn server event loop.
Error
2020-10-08 00:59:56.397 BST
EOF found in spawn pipe.

Hi @DylanDKnight, let me try to address all the issues that you mentioned, one by one.

The

2020-10-07 22:04:08: netdata ERROR : MAIN : LISTENER: Invalid listen port 0 given. Defaulting to 19999. (errno 22, Invalid argument)

error, although harmless, should be resolved by netdata/netdata#10045 above.

I am still working on the rest of the issues.

@knatsakis Not a problem, sorry for adding a chunk to your backlog! aha.

If you need me to do any digging on my end, let me know.

Hey @DylanDKnight,

v2.0.11 of the helm chart (with appVersion v1.26.0) should contain all the relevant fixes.

Could you try it and let me know?

Thanks

@knatsakis

I did a clean clone, and install.

So log wise it there are fewer errors.

But it still appears to fall over every few minutes, the only logs I can see that have any errors in I have included below.

I also still don't see any charts apart from netdata charts in netdata.cloud.

If there is anything, in particular, I should look for/grab let me know

Error
2020-10-15 20:43:47.372 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : thread with task id 277 finished
Error
2020-10-15 20:43:47.372 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:5275]: receive thread ended (task id 277)
Error
2020-10-15 20:43:47.372 BST
netdata2020-10-15 19:43:47: netdata ERROR : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:5275]: disconnected (completed 0 updates). (errno 22, Invalid argument)
Error
2020-10-15 20:43:47.372 BST
netdata2020-10-15 19:43:47: netdata ERROR : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : requested a CHART, without a type.id, on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678'. Disabling it. (errno 22, Invalid argument)
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:5275]: receiving metrics...
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : Postponing health checks for 60 seconds, on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678', because it was just connected.
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:5275]: Netdata is using the stream version 3.
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:5275]: initializing communication...
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [10.16.0.1]:5275: receive thread created (task id 277)
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : set name of thread 277 to STREAM_RECEIVER
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:5275] : thread created with task id 277
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : WEB_SERVER[static1] : clients wants to STREAM metrics.
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : thread with task id 276 finished
Error
2020-10-15 20:43:47.370 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:14248]: receive thread ended (task id 276)
Error
2020-10-15 20:43:47.369 BST
netdata2020-10-15 19:43:47: netdata ERROR : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:14248]: disconnected (completed 0 updates). (errno 22, Invalid argument)
Error
2020-10-15 20:43:47.369 BST
netdata2020-10-15 19:43:47: netdata ERROR : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : requested a CHART, without a type.id, on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678'. Disabling it. (errno 22, Invalid argument)
Error
2020-10-15 20:43:47.367 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:14248]: receiving metrics...
Error
2020-10-15 20:43:47.367 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : Postponing health checks for 60 seconds, on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678', because it was just connected.
Error
2020-10-15 20:43:47.367 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:14248]: Netdata is using the stream version 3.
Error
2020-10-15 20:43:47.367 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:14248]: initializing communication...
Error
2020-10-15 20:43:47.367 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [10.16.0.1]:14248: receive thread created (task id 276)
Error
2020-10-15 20:43:47.367 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : set name of thread 276 to STREAM_RECEIVER
Error
2020-10-15 20:43:47.367 BST
netdata2020-10-15 19:43:47: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:14248] : thread created with task id 276

Error
2020-10-15 20:46:37.379 BST
netdata2020-10-15 19:46:37: netdata INFO : WEB_SERVER[static3] : clients wants to STREAM metrics.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : thread with task id 345 finished
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:23380]: receive thread ended (task id 345)
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata ERROR : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:23380]: disconnected (completed 23 updates). (errno 22, Invalid argument)
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata ERROR : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : requested a CHART, without a type.id, on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678'. Disabling it. (errno 22, Invalid argument)
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.mem_usage_limit' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.mem_usage' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.pgfaults' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.mem_activity' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.writeback' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.mem' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.cpu_per_core' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.cpu_limit' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_kube_dns_autoscaler_645f7d66cf_s4w4j_543e5bb8_3ff1_44af_8552_62186282cf6d_autoscaler.cpu' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.mem_usage_limit' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.mem_usage' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.pgfaults' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.mem_activity' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.writeback' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.378 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.mem' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.377 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.cpu_per_core' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.377 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.cpu_limit' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.377 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_metrics_server_v0.3.6_64655c969_d4dd2_b584a363_6f1f_46df_b849_092c070338c1_metrics_server.cpu' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.377 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_fluentd_gke_wpgzk_81a9af93_0dc8_4ea9_9068_d6ebe3efd83e_fluentd_gcp.mem_usage_limit' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.377 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_fluentd_gke_wpgzk_81a9af93_0dc8_4ea9_9068_d6ebe3efd83e_fluentd_gcp.mem_usage' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.377 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_fluentd_gke_wpgzk_81a9af93_0dc8_4ea9_9068_d6ebe3efd83e_fluentd_gcp.pgfaults' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.377 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_fluentd_gke_wpgzk_81a9af93_0dc8_4ea9_9068_d6ebe3efd83e_fluentd_gcp.mem_activity' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.377 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : RRDSET: chart name 'cgroup_k8s_kube_system_fluentd_gke_wpgzk_81a9af93_0dc8_4ea9_9068_d6ebe3efd83e_fluentd_gcp.writeback' on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678' already exists.
Error
2020-10-15 20:46:37.376 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:23380]: receiving metrics...
Error
2020-10-15 20:46:37.376 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : Postponing health checks for 60 seconds, on host 'gke-binance-futures-m-binance-futures-3f39fac9-6678', because it was just connected.
Error
2020-10-15 20:46:37.376 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:23380]: Netdata is using the stream version 3.
Error
2020-10-15 20:46:37.376 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [receive from [10.16.0.1]:23380]: initializing communication...
Error
2020-10-15 20:46:37.376 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : STREAM gke-binance-futures-m-binance-futures-3f39fac9-6678 [10.16.0.1]:23380: receive thread created (task id 345)
Error
2020-10-15 20:46:37.376 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : set name of thread 345 to STREAM_RECEIVER
Error
2020-10-15 20:46:37.376 BST
netdata2020-10-15 19:46:37: netdata INFO : STREAM_RECEIVER[gke-binance-futures-m-binance-futures-3f39fac9-6678,[10.16.0.1]:23380] : thread created with task id 345
Error
2020-10-15 20:46:37.376 BST
netdata2020-10-15 19:46:37: netdata INFO : WEB_SERVER[static1] : clients wants to STREAM metrics.

@DylanDKnight,

Unfortunately, I am not able to reproduce the issue. I have used a freshly cloned repo and installed netdata with:

helm install                                 \
  --set parent.resources.limits.cpu=1        \
  --set parent.resources.requests.cpu=1      \
  --set parent.resources.limits.memory=1Gi   \
  --set parent.resources.requests.memory=1Gi \
  --set child.resources.limits.cpu=1         \
  --set child.resources.requests.cpu=1       \
  --set child.resources.limits.memory=1Gi    \
  --set child.resources.requests.memory=1Gi  \
  --set parent.database.persistence=true     \
  --set parent.alarms.persistence=true       \
  --set parent.claiming.enabled=true         \
  --set service.port=19998                   \
  --set parent.claiming.token="TOKEN"        \
  --set parent.claiming.rooms="ROOM"         \
  netdata ./charts/netdata

It stays up after that.

Could you upload the full netdata parent logs somewhere, preferably from start to finish?

Also output from

kubectl describe pod netdata-parent-xxxxxx

may show k8s events that maybe relevant.

Thanks!

I pulled fresh and did a clean install.

Logs CSV, I just pulled a load, as it's hard to see where it is falling over.

netdatalogs.xlsx

Netdata Parent Describe

Name:                 netdata-parent-bbd65d4fd-cmcwh
Namespace:            default
Priority:             1000
Priority Class Name:  low-priority
Node:                 gke-binance-futures-m-binance-futures-3f39fac9-6678/10.146.15.211
Start Time:           Wed, 21 Oct 2020 14:23:39 +0000
Labels:               app=netdata
                      pod-template-hash=bbd65d4fd
                      release=netdata
                      role=parent
Annotations:          checksum/config: 2abecb8f6dbe6015e7f499b85f4f1473da705653a59f43acd9a1273b4999d4d4
Status:               Running
IP:                   10.16.0.107
IPs:
  IP:           10.16.0.107
Controlled By:  ReplicaSet/netdata-parent-bbd65d4fd
Containers:
  netdata:
    Container ID:  docker://a58ca1b53bc58da02e445e2d119fe6df3689f45fe3ec5c9d2a59d657be2197f9
    Image:         netdata/netdata:v1.26.0
    Image ID:      docker-pullable://netdata/netdata@sha256:784cf58204a686ec461bd716d6697e4a842b7edbdeccf0ae4c4d0e8cd5186fc4
    Port:          19998/TCP
    Host Port:     0/TCP
    Command:
      sh
      -c
      exec /usr/sbin/run.sh -W set2 cloud global enabled true -W set2 cloud global "cloud base url" "https://app.netdata.cloud" -W "claim -token=FEYTdWgT5kR5e7b7_nTkxI8J-2ZzMtSZzlkgqgx3-pXGID2byoTc5E4G7P2EsRD4_v2K0Cvw9Zyfs_ej2mFKU0xDMQcr_tImLX9WPIoxLJKoDzEHyrzKxtLhawGJGqaSPTDgECU -rooms=c5cce931-f9d3-4f2d-917d-642a811542f9 -url=https://app.netdata.cloud"
    State:          Running
      Started:      Wed, 21 Oct 2020 14:24:01 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   http-get http://:http/api/v1/info delay=0s timeout=1s period=30s #success=1 #failure=3
    Readiness:  http-get http://:http/api/v1/info delay=0s timeout=1s period=30s #success=1 #failure=3
    Environment:
      MY_POD_NAME:            netdata-parent-bbd65d4fd-cmcwh (v1:metadata.name)
      MY_POD_NAMESPACE:       default (v1:metadata.namespace)
      NETDATA_LISTENER_PORT:  19998
    Mounts:
      /etc/netdata/health_alarm_notify.conf from config (rw,path="health")
      /etc/netdata/netdata.conf from config (rw,path="netdata")
      /etc/netdata/stream.conf from config (rw,path="stream")
      /var/cache/netdata from database (rw)
      /var/lib/netdata from alarms (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from netdata-token-ckq2f (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      netdata-conf-parent
    Optional:  false
  database:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  netdata-parent-database
    ReadOnly:   false
  alarms:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  netdata-parent-alarms
    ReadOnly:   false
  netdata-token-ckq2f:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  netdata-token-ckq2f
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Warning  FailedScheduling        87s (x2 over 88s)  default-scheduler        pod has unbound immediate PersistentVolumeClaims
  Normal   Scheduled               84s                default-scheduler        Successfully assigned default/netdata-parent-bbd65d4fd-cmcwh to gke-binance-futures-m-binance-futures-3f39fac9-6678
  Normal   SuccessfulAttachVolume  77s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-d37da175-16cd-4b27-ab64-2529d0d3eaf0"
  Normal   SuccessfulAttachVolume  74s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-97dbec77-d085-4b53-a675-e199993ddc8d"
  Normal   Pulling                 66s                kubelet                  Pulling image "netdata/netdata:v1.26.0"
  Normal   Pulled                  62s                kubelet                  Successfully pulled image "netdata/netdata:v1.26.0"
  Normal   Created                 62s                kubelet                  Created container netdata
  Normal   Started                 62s                kubelet                  Started container netdata

Interestingly, I see almost the same sympthoms having the:

[web]
   mode = none

in the following config (netdata-values.yaml):

parent:
  claiming:
    enabled: true
    token: XXX
    rooms: YYY
child:
  claiming:
    enabled: true
    token: XXX
    rooms: YYY
  configs:
    netdata:
      data: |
        [global]
          memory mode = ram
          history = 3600
          access log = none
          update every = 5
        [health]
          enabled = no
        [web]
          mode = none
ingress:
  enabled: false

which I use to update values via helm upgrade -f netdata-values.yaml netdata netdata/netdata

@mbuczko @DylanDKnight are you guys still having problems?
pod has unbound immediate PersistentVolumeClaims is usually an error when trying to point to storageclass for PVC which does not existis.

Closing due to lack of response.