benbjohnson/litestream

Restore freeze on GKE

Closed this issue · 3 comments

Hi there, I have been trying out Litestream on a work project recently. On my laptop it works great. On Google Kubernetes Engine the litestream restore command freezes. E.g.:

2023/07/18 17:43:49.871991 gcs: restoring snapshot c3d969219fd460ab/00000000 to /tmp/candle_cache.db.tmp
2023/07/18 17:43:50.127277 gcs: restoring wal files: generation=c3d969219fd460ab index=[00000000,00000002]
2023/07/18 17:43:50.201653 gcs: downloaded wal c3d969219fd460ab/00000002 elapsed=74.06275ms
2023/07/18 17:43:50.247752 gcs: downloaded wal c3d969219fd460ab/00000000 elapsed=120.351917ms
2023/07/18 17:43:50.262966 gcs: applied wal c3d969219fd460ab/00000000 elapsed=14.743875ms
2023/07/18 17:43:50.391388 gcs: downloaded wal c3d969219fd460ab/00000001 elapsed=263.961875ms
2023/07/18 17:43:50.401252 gcs: applied wal c3d969219fd460ab/00000001 elapsed=9.831083ms
2023/07/18 17:43:50.409585 gcs: applied wal c3d969219fd460ab/00000002 elapsed=8.287125ms

There should be a log message like gcs: renaming database from temporary location, but this never appears.

The /tmp location is a Kubernetes emptyDir volume–a scratch volume backed by whatever storage the node has available. So I'm not actually sure what exactly it is. I figured I'd ask just in case–can you think of anything that might prevent the final file rename step but not the previous download steps? Thanks!

hifi commented

Hi, could you try with the latest release if you can still reproduce this?

I'm not deploying it for the time being as we are using a different method. I can close this ticket if that's appropriate.

hifi commented

Thanks for for getting back. Please open a new issue if you still encounter this in the future!