kimmobrunfeldt/git-hours

Take diff size into account when estimating first commit work in each session

rMazeiks opened this issue · 0 comments

For the first commit of each "coding session", we cannot tell how much time was spent on it based on the timestamps alone. Currently, the algorithm simply uses a constant to estimate the work for those commits. I believe better estimates can be easily made.

Proposed algorithm:

  • Group commits into commit sessions, separating where time between commits > 2h (same as current algorithm)
  • Estimate average time to edit a line of code:
    • let known_work be the set of commits with known hours of work (i.e., all but the first commit in each session)
    • average time to edit a line of code = total lines edited in known_work / total time spent in known_work
  • Estimate total time spent:
    • For the first commit in each session, multiply the number of lines changed in that commit by the average time to edit a line of code
    • For other commits, assume the entire duration since the last commit was spent working (same as current algorithm)

Other considerations and alternatives:

  • Instead of time per line, we could use time per character or other metric
  • Instead of estimating hours of work based on time per line, we could assume that the time spent on the first commit in a session would be similar to the time spent on other commits.
  • For first commits, the estimated time may be capped by the duration since last commit
  • Current algorithms assume there is 1 author if I understand correctly. Would be nice to compute time per-author. Not only would this be a more detailed metric, but it would also be more accurate, as 2 people committing at the same time shouldn't be considered a single session.