Failed to check Cincinnati for updates

Question

Failed to check Cincinnati for updates

gongx opened this issue 3 years ago · 6 comments

Bug Report

I am setting zincat with fleetlock strategy.
But keep getting this exception:

[INFO  zincati::update_agent::actor] reached steady state, periodically polling for updates
zincati[8956]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)
zincati[8956]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)

I did not set any configuration for Cincinnati. So, I think that it should follow the default behavior.

[INFO  zincati::cli::agent] starting update agent (zincati 0.0.23)
[INFO  zincati::strategy::fleet_lock] remote fleet_lock reboot manager: http://fleetlock.fleetlock.svc.cluster.local:8080/
[INFO  zincati::cincinnati] Cincinnati service: https://updates.coreos.fedoraproject.org
[INFO  zincati::cli::agent] agent running on node '9dbff4370da742d1a76c7193ce119158', in update group 'kubelet'
[INFO  zincati::update_agent::actor] registering as the update driver for rpm-ostree
[INFO  zincati::update_agent::actor] initialization complete, auto-updates logic enabled
[INFO  zincati::strategy] update strategy: fleet_lock

Environment

using fedora coreos : fedora:fedora/aarch64/coreos/testing-devel

Expected Behavior

can connect to Cincinnati service and periodically check if any update is available

Actual Behavior

failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)

Reproduction Steps

start the zincati service and it happens all the time

Other Information

The host is in AWS. Not sure whether it is related to aws network setting or something else?
For the stable version of coreOS(fedora:fedora/x86_64/coreos/stable), I see the same exception "failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)"

Answer 1 · 2021-12-01T19:21:15.000Z

Thanks for the report. Can you please attach the full logs (with timestamps) coming from journalctl -b 0 -u zincati.service?

My guess is that you are seeing some sporadic errors (and spaced over time) from the Fedora infra. Looking at my own nodes, I can also see a few transient hiccups logged today.

Overall, it shouldn't be a problem as the agent re-checks for updates after few minutes. You can check the current agent status with systemctl status zincati.service | grep Status, or you can record its metrics on an ongoing basis.

Answer 2 · 2021-12-01T19:25:44.000Z

Actually, the comment above is valid only for your stable machines.

The testing-devel stream does not support auto-updates, thus it can't really work. The default Zincati configuration in that case does even disable the update logic. Did you manually override that?

Answer 3 · 2021-12-01T19:35:10.000Z

Yes, I manually enable it on testing-devel coreos node. But thank you for confirming that the testing-devel stream does not support auto-updates

Answer 4 · 2021-12-01T19:37:36.000Z

For reference, these are all the FCOS update streams: https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/

Answer 5 · 2021-12-01T21:52:09.000Z

Thank you, @lucab. I check the metrics on the coreOS stable node.
zincati_cincinnati_update_checks_errors_total{kind="client_failed_request"} 5 zincati_cincinnati_update_checks_total 204
Looks like that there are some sporadic errors.

And for the node which is using testing-devel version coreOS,
zincati_cincinnati_update_checks_errors_total{kind="generic_http_500"} 2074 zincati_cincinnati_update_checks_total 2074

because testing-devel stream does not support auto-updates, failure is expected?

Answer 6 · 2021-12-07T17:17:58.000Z

because testing-devel stream does not support auto-updates, failure is expected?

Yes, as you can see from the 100% of failures.
By comparison, your other node hit a ~2% of temporarily failed requests, which is a reasonable SLI.

I'm going ahead and closing this ticket. There are some improvements that could be done on the backed to report back to the clients that an invalid arch+stream combination have been requested; coreos/fedora-coreos-cincinnati#64 tracks that.