Strange behavior of since filter
cmtg opened this issue · 4 comments
Description:
It seems like the since filter is not working properly in some 'random' cases. For example, 3 commits in the jquery-mobile repository with and committer_date and author_date on the 2011-10-05 get filtered if the since filter is set to 2011-09-22T23:23:44.
However, If I set the filter to 2011-09-22T23:23:43.999999 the same 3 commits are considered by pydriller.
Reproduction in the Python console:
import datetime
from pydriller import Repositoryrepo = Repository('https://github.com/jquery/jquery-mobile.git',since=datetime.datetime(2011, 9, 22, 23, 23, 43, 999999),to=datetime.datetime(2011, 11, 1, 0, 0, 0),only_commits=['27b51c47e979917969abf3854ad9135a3704855f','141b199224dc650f3d61a65fc91c32cdab680ee3','114cee84fe3736c0af42211f931d7c0f1739c3d4'])
for c in repo.traverse_commits():
... print(c.hash)
... print(c.msg)
... print(c.author_date)
... print(c.committer_date)
...
27b51c47e979917969abf3854ad9135a3704855f
Fixed$.jqmData() behavior to match $ .fn.jqmData()
2011-10-05 15:12:11-07:00
2011-10-05 15:12:11-07:00
141b199224dc650f3d61a65fc91c32cdab680ee3
Updated tests for$.jqmData() and $ .fn.jqmData() to match the new behavior
2011-10-05 15:13:44-07:00
2011-10-05 15:13:44-07:00
114cee84fe3736c0af42211f931d7c0f1739c3d4
Merge remote branch 'upstream/master'
2011-10-05 15:14:39-07:00
2011-10-05 15:14:39-07:00repo = Repository('https://github.com/jquery/jquery-mobile.git',since=datetime.datetime(2011, 9, 22, 23, 23, 44, 0),to=datetime.datetime(2011, 11, 1, 0, 0, 0),only_commits=['27b51c47e979917969abf3854ad9135a3704855f','141b199224dc650f3d61a65fc91c32cdab680ee3','114cee84fe3736c0af42211f931d7c0f1739c3d4'])
for c in repo.traverse_commits():
... print(c.hash)
...
Versions:
gitdb==4.0.10
GitPython==3.1.30
lizard==1.17.10
PyDriller==2.4
pytz==2022.7.1
smmap==5.0.0
types-pytz==2022.7.1.0
I managed to trace the problem back to GitPython and opened a respective ticket (gitpython-developers/GitPython#1553).
Found the solution on the GitPython side (gitpython-developers/GitPython#1553), the problem is rooted in git's behavior of the --since flag. The expected behavior above can be reproduced with a --since-as-filter flag, which is being provided by GitPython's since_as_filter= parameter.
It would be great if pydriller could implement that parameter as well.
Ah that makes sense.
Just a small follow up though, this command will be much more heavy than the normal --since
, because it will traverse the entire history instead of stopping at the first commit older than the date.
Anyway, we can definitely add the option.
Do you feel like working on it? 🙂
Cool ... It'll take me while to get to it, but will send you a pull request when ready.
On the performance question, I am not too worried about it since all the code analysis with lizard seems to be the bottleneck in my case.