Failed to check Cincinnati for updates
gongx opened this issue · 6 comments
Bug Report
I am setting zincat with fleetlock strategy.
But keep getting this exception:
[INFO zincati::update_agent::actor] reached steady state, periodically polling for updates
zincati[8956]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)
zincati[8956]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)
I did not set any configuration for Cincinnati. So, I think that it should follow the default behavior.
[INFO zincati::cli::agent] starting update agent (zincati 0.0.23)
[INFO zincati::strategy::fleet_lock] remote fleet_lock reboot manager: http://fleetlock.fleetlock.svc.cluster.local:8080/
[INFO zincati::cincinnati] Cincinnati service: https://updates.coreos.fedoraproject.org
[INFO zincati::cli::agent] agent running on node '9dbff4370da742d1a76c7193ce119158', in update group 'kubelet'
[INFO zincati::update_agent::actor] registering as the update driver for rpm-ostree
[INFO zincati::update_agent::actor] initialization complete, auto-updates logic enabled
[INFO zincati::strategy] update strategy: fleet_lock
Environment
using fedora coreos : fedora:fedora/aarch64/coreos/testing-devel
Expected Behavior
can connect to Cincinnati service and periodically check if any update is available
Actual Behavior
failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)
Reproduction Steps
start the zincati service and it happens all the time
Other Information
The host is in AWS. Not sure whether it is related to aws network setting or something else?
For the stable version of coreOS(fedora:fedora/x86_64/coreos/stable), I see the same exception "failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)"
Thanks for the report. Can you please attach the full logs (with timestamps) coming from journalctl -b 0 -u zincati.service
?
My guess is that you are seeing some sporadic errors (and spaced over time) from the Fedora infra. Looking at my own nodes, I can also see a few transient hiccups logged today.
Overall, it shouldn't be a problem as the agent re-checks for updates after few minutes. You can check the current agent status with systemctl status zincati.service | grep Status
, or you can record its metrics on an ongoing basis.
Actually, the comment above is valid only for your stable
machines.
The testing-devel
stream does not support auto-updates, thus it can't really work. The default Zincati configuration in that case does even disable the update logic. Did you manually override that?
Yes, I manually enable it on testing-devel coreos node. But thank you for confirming that the testing-devel stream does not support auto-updates
For reference, these are all the FCOS update streams: https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/
Thank you, @lucab. I check the metrics on the coreOS stable node.
zincati_cincinnati_update_checks_errors_total{kind="client_failed_request"} 5 zincati_cincinnati_update_checks_total 204
Looks like that there are some sporadic errors.
And for the node which is using testing-devel version coreOS,
zincati_cincinnati_update_checks_errors_total{kind="generic_http_500"} 2074 zincati_cincinnati_update_checks_total 2074
because testing-devel stream does not support auto-updates, failure is expected?
because testing-devel stream does not support auto-updates, failure is expected?
Yes, as you can see from the 100% of failures.
By comparison, your other node hit a ~2% of temporarily failed requests, which is a reasonable SLI.
I'm going ahead and closing this ticket. There are some improvements that could be done on the backed to report back to the clients that an invalid arch+stream combination have been requested; coreos/fedora-coreos-cincinnati#64 tracks that.