josegonzalez/python-github-backup

Proposal to use API repository `pushed_at` instead of `update_at` field for `last_updated`

kenbailey opened this issue · 0 comments

The last_updated timestamp file used by the --incremental option is being set by the updated_at field in the GitHub Repositories API. However, the updated_at field appears to only gets updated with specific types of activities (i.e., it doesn't get updated with code pushes even).

https://stackoverflow.com/questions/15918588/github-api-v3-what-is-the-difference-between-pushed-at-and-updated-at

Thus the --incremental option is using a since date in the API call that could be older than the last time the script was run. In my case with 90+ repositories, the last_updated is currently 3 weeks behind.

The pushed_at field for the repositories is updated for things like code pushes. I'd be willing to submit a PR, if this makes sense. It would replace the updated_at for pushed_at.

last_update = max(list(repository['updated_at'] for repository in repositories) or [time.strftime('%Y-%m-%dT%H:%M:%SZ', time.localtime())]) # noqa

An alternative approach would be to get the max of either updated_at and pushed_at . There are scenarios when updated_at is a later date.

Thanks, this script is working wonderfully for us.