Take diff size into account when estimating first commit work in each session
rMazeiks opened this issue · 0 comments
rMazeiks commented
For the first commit of each "coding session", we cannot tell how much time was spent on it based on the timestamps alone. Currently, the algorithm simply uses a constant to estimate the work for those commits. I believe better estimates can be easily made.
Proposed algorithm:
- Group commits into commit sessions, separating where time between commits > 2h (same as current algorithm)
- Estimate average time to edit a line of code:
- let known_work be the set of commits with known hours of work (i.e., all but the first commit in each session)
- average time to edit a line of code = total lines edited in known_work / total time spent in known_work
- Estimate total time spent:
- For the first commit in each session, multiply the number of lines changed in that commit by the average time to edit a line of code
- For other commits, assume the entire duration since the last commit was spent working (same as current algorithm)
Other considerations and alternatives:
- Instead of time per line, we could use time per character or other metric
- Instead of estimating hours of work based on time per line, we could assume that the time spent on the first commit in a session would be similar to the time spent on other commits.
- For first commits, the estimated time may be capped by the duration since last commit
- Current algorithms assume there is 1 author if I understand correctly. Would be nice to compute time per-author. Not only would this be a more detailed metric, but it would also be more accurate, as 2 people committing at the same time shouldn't be considered a single session.