The project uses pydriller to collect a dataset of developers and their commits in github. Dataset collects different information like : 'commit_ID', 'Author_Name', 'Authored_Date','email','msg','Commiter','committer_date','project_path','Commit_before', 'Commit_after','diff','Added_LOC','Removed_LOC','Num_LOC','token_counts'
'Commit_before' and 'commit_after' helps us to find out the real change which is applied by a developer into a source file.
Reference to pydriller: https://github.com/ishepard/pydriller