Handle shallow clones properly
Closed this issue · 5 comments
Currently Git Hammer requires a full clone of the repository, since it processes every commit that is referenced. It would be good to be able to support shallow clones, where history far enough back is no longer available.
This is somewhat related to issue #5 since the resolution to that issue would also require starting from one or more commits that have parents.
Shallow clones are now supported in the sense that a shallowly cloned repository can be processed. Unshallowing such a repository is not currently supported, since the now-incorrect data from the shallow processing will not be recomputed. I'm going to close this issue and possibly open a new one or more for updating existing data.
can we have a feature to ignore the very first commit? (take it into account for the diff to the next commit but do not include?)
What would be the benefit of ignoring just the first commit? What kind of use case do you have in mind?
my understanding, from reading this comment #5 (comment), was the shallow clone would make it so all the pre-existing files were counted on that first commit, but maybe it doesn't work that way? i wasn't entirely sure at the time. i was trying to count changes that happened in 2019 and worried that all of the files from before then would count for the first person to commit to the given repo as if it were the 'init' commit. my other thought was projects that existed before git, the person who inits it into git suddenly has a jump.
I think I see what you mean. You'd like to have the initial state, at the first commit, to count as 0 so that any statistics would be based only on the changes during the period of inspection, excluding what was in the repository before. Does that sound right? That sounds like useful functionality.
You're right that everything that is present in the initial commit of the repository will count for the person who made that commit, since there is no history beyond that. This applies to the earliest commit of a shallow clone too, since even though there might be more history in the remote repository, the clone does not have it available. This does make the statistics of course "incorrect", just as with an existing code base imported into git, but on the other hand, it's the best that can be done with the information available.