Pandas (soon to be) deprecated methods, errors when resuming repo
Closed this issue · 0 comments
Hi @gotec,
Finally using git2net again!
Describe the bug
Pandas is planning to deprecate many functions/to have more discipline (supposedly preparing for pandas 3). Some trigger warnings, some already trigger errors.
It is mainly about types having to be set and not inferred.
To Reproduce
The tests with pytest already trigger some:
gambit/algorithms/gambit.py:262: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:
df["col"][row_indexer] = value
Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
authors['author_id'][idx2] = authors.loc[idx1, 'author_id']
git2net/git2net/visualisation.py:378: FutureWarning: Series.view is deprecated and will be removed in a future version. Use ``astype`` as an alternative to change the dtype.
zip(pd.to_datetime(data.time, format='%Y-%m-%d %H:%M:%S').view('int64'),
tests/test_functions.py::test_process_commit_merge
tests/test_functions.py::test_process_commit_merge
tests/test_functions.py::test_process_commit_merge2
tests/test_functions.py::test_process_commit_merge2
git2net/git2net/extraction.py:917: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'accepted' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
comp.loc[comp['_merge'] == 'both', '_action'] = 'accepted'
And an error that just appears as warning in the tests but can happen manually, on mining the git2net repo:
[2024-06-06 13:40:03] git2net:INFO Provided folder is not empty.
[2024-06-06 13:40:03] git2net:INFO Skipping the cloning and trying to resume.
[2024-06-06 13:40:03] git2net:INFO Found no database on provided path. Starting from scratch.
[2024-06-06 13:40:03] git2net:ERROR processing error: 40cc53f783aeb835fbec20f4d5e165af4e24fd32
Serial: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "git2net/blah/blah.py", line 8, in <module>
git2net.mine_github(github_url, f'cloned_repos/{git_repo_dir}', sqlite_db_file,
File "git2net/git2net/extraction.py", line 1916, in mine_github
mine_git_repo(git_repo_dir, sqlite_db_file, **kwargs)
File "git2net/git2net/extraction.py", line 1856, in mine_git_repo
_process_repo_serial(git_repo_dir, sqlite_db_file, u_commits,
File "git2net/git2net/extraction.py", line 1340, in _process_repo_serial
_log_commit_results(log, exception)
File "git2net/git2net/extraction.py", line 1308, in _log_commit_results
raise Exception(exception)
Exception: git2net/git2net/extraction.py:1257: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df_edits = pd.concat(
Two solutions here: init the empty df as a typed df; or testing for emptiness (df.empty
) to know when merge is useless.
I could not replicate that in the tests, but i could fix it with explicitly ignoring the warnings (relevant lines at the beggining of the script). Even if they are just warnings at the moment, it probably could be a good idea to solve them before pandas 3.
import warnings
warnings.resetwarnings()
warnings.simplefilter(action='ignore', category=FutureWarning)
import os
import git2net
git_repo_dir = 'git2net'
github_url = f'gotec/{git_repo_dir}'
sqlite_db_file = f'{git_repo_dir}_git2net.db'
if not os.path.exists('cloned_repos'):
os.makedirs('cloned_repos')
git2net.mine_github(github_url, f'cloned_repos/{git_repo_dir}', sqlite_db_file,
no_of_processes=1,
commits=[
'40cc53f783aeb835fbec20f4d5e165af4e24fd32',
]
)
the corresponding test entry: (i.e. never failing)
def test_mine_github_prob(github_url_short, github_repo_dir, sqlite_db_file):
git2net.mine_github(github_url_short, github_repo_dir, sqlite_db_file,
no_of_processes=1,
commits=[
'40cc53f783aeb835fbec20f4d5e165af4e24fd32',
]
)
Desktop (please complete the following information):
- OS: Ubuntu 22.04
- Version: latest commit