ishepard/pydriller

Strange behavior of since filter

cmtg opened this issue · 4 comments

cmtg commented

Description:

It seems like the since filter is not working properly in some 'random' cases. For example, 3 commits in the jquery-mobile repository with and committer_date and author_date on the 2011-10-05 get filtered if the since filter is set to 2011-09-22T23:23:44.

However, If I set the filter to 2011-09-22T23:23:43.999999 the same 3 commits are considered by pydriller.

Reproduction in the Python console:

import datetime
from pydriller import Repository

repo = Repository('https://github.com/jquery/jquery-mobile.git',since=datetime.datetime(2011, 9, 22, 23, 23, 43, 999999),to=datetime.datetime(2011, 11, 1, 0, 0, 0),only_commits=['27b51c47e979917969abf3854ad9135a3704855f','141b199224dc650f3d61a65fc91c32cdab680ee3','114cee84fe3736c0af42211f931d7c0f1739c3d4'])
for c in repo.traverse_commits():
... print(c.hash)
... print(c.msg)
... print(c.author_date)
... print(c.committer_date)
...
27b51c47e979917969abf3854ad9135a3704855f
Fixed $.jqmData() behavior to match $.fn.jqmData()
2011-10-05 15:12:11-07:00
2011-10-05 15:12:11-07:00
141b199224dc650f3d61a65fc91c32cdab680ee3
Updated tests for $.jqmData() and $.fn.jqmData() to match the new behavior
2011-10-05 15:13:44-07:00
2011-10-05 15:13:44-07:00
114cee84fe3736c0af42211f931d7c0f1739c3d4
Merge remote branch 'upstream/master'
2011-10-05 15:14:39-07:00
2011-10-05 15:14:39-07:00

repo = Repository('https://github.com/jquery/jquery-mobile.git',since=datetime.datetime(2011, 9, 22, 23, 23, 44, 0),to=datetime.datetime(2011, 11, 1, 0, 0, 0),only_commits=['27b51c47e979917969abf3854ad9135a3704855f','141b199224dc650f3d61a65fc91c32cdab680ee3','114cee84fe3736c0af42211f931d7c0f1739c3d4'])
for c in repo.traverse_commits():
... print(c.hash)
...

Versions:

gitdb==4.0.10
GitPython==3.1.30
lizard==1.17.10
PyDriller==2.4
pytz==2022.7.1
smmap==5.0.0
types-pytz==2022.7.1.0

cmtg commented

I managed to trace the problem back to GitPython and opened a respective ticket (gitpython-developers/GitPython#1553).

cmtg commented

Found the solution on the GitPython side (gitpython-developers/GitPython#1553), the problem is rooted in git's behavior of the --since flag. The expected behavior above can be reproduced with a --since-as-filter flag, which is being provided by GitPython's since_as_filter= parameter.

It would be great if pydriller could implement that parameter as well.

Ah that makes sense.
Just a small follow up though, this command will be much more heavy than the normal --since, because it will traverse the entire history instead of stopping at the first commit older than the date.

Anyway, we can definitely add the option.
Do you feel like working on it? 🙂

cmtg commented

Cool ... It'll take me while to get to it, but will send you a pull request when ready.

On the performance question, I am not too worried about it since all the code analysis with lizard seems to be the bottleneck in my case.