gotec/git2net

Timestamp must either be string or int

Closed this issue · 9 comments

Hi,
I'm running the tutorial, and I get this error when running git2net.get_coediting_network

error:
AssertionError: Timestamp must either be string or int
on temporal_network.py in add_edge line 377

code:
t, node_info, edge_info = git2net.get_coediting_network(sqlite_db_file)

A further inspection shows me that the variable ts is actually valued and is a np.float64: 1548953662.0

I have run it in 2 different repos and also using:

  • linux
  • git version 2.12.3
  • python 3.7 & 3.9
  • jupyterlab==3.2.4
  • pygit2==1.7.1
  • python-Levenshtein==0.12.2
  • gambit-disambig==1.0.3
  • git2net==1.5.2
  • gitdb==4.0.9
  • GitPython==3.1.24

Maybe I'm using a wrong version of any of the dependecies?
Thanks!

gotec commented

Hi Lisette,

From the error message, the most likely source for this issue would be your version of pathpy. Could you tell me which version you are running?

Cheers,
Christoph

Hi Christoph,
I have pathpy2==2.2.0 which is the one git2net requires (it got automatically installed when I installed git2net).
thanks!

gotec commented

Thanks, pathpy2 should work fine. Can you confirm that the tutorial works correctly for you? Or do you also get the same issues there? If so, could you tell me which repository you are getting the error on? Then I can try to replicate the issue.

(just in case the error is on git2net.get_coediting_network)

I did not clone this repo to use the tutorial.
I created my own notebook, on a new conda env using py39.
I first installed pygit2 and then git2net.
Everything run correctly until git2net.get_coediting_network (the first code cell within 'Network Analysis and visualization')

I first tried https://github.com/mocnik-science/osm-python-tools.
I got the error, and then tried https://github.com/gotec/git2net.git, as shown in the tutorial.
I also tried on py37 and the error persisted.

I just realized that the problem is on pathpy ( /pathpy/classes/temporal_network.py) and no on git2net.
But it is still not clear to me from where it gets the list of edges and timestamps. I guess from the sqlite_db_file.

As a quick and dirty work around, I added this on line 377 ( /pathpy/classes/temporal_network.py):
ts = int(ts) if type(ts) ==_np.float64 else ts

Then, I could reproduce everything as shown in your tutorial.
However, when I re-run a previous cell, I got another error (will add it as another issue).

Just in case, I'm sharing the list of packages that my environment has:
requirements.txt

Thanks!

gotec commented

Unfortunately I am not able to reproduce your issue which makes solving it challenging :)

Running the following code yields correct temporal networks for both repositories on my setup:

import git2net

# repo_path = 'git2net4analysis'
# db_path = 'git2net_mined.db'

repo_path = 'osm-python-tools'
db_path = 'osm-python-tools_mined.db'

git2net.mine_git_repo(repo_path, db_path)

net, _, _ = git2net.get_coediting_network(db_path)
net

The only thing I did before running the code was to manually clone the repositories to the respective folders.
Can you check if this code works for you? Could you share a minimal example of your setup so I can try to replicate the issue with that? Maybe I misunderstood something from your descriptions.

I could not find any differences in the packages either.

Cheers,
Christoph

Hi, thanks for looking into this.
I tried what you suggested:

  1. Clone the repo manually (not using the function from pygit2)
  2. Run the code you suggested. Here I got an error that the author ids were not set, and that I needed to run the disambiguation first.
    Exception: The author_id is not yet computed. To use author_id as identifier, please run git2net.disambiguate_aliases_db on the database before visualisation.
  3. I added the disambiguation call, run it again, and the error about the timestamp still showed up :-(

This is my code:

import git2net
import os 

local_directory = '../datasets/github/git2net/'
repo_path = os.path.join(local_directory,'osm-python-tools')
db_path = os.path.join(local_directory,'osm-python-tools.db')

git2net.mine_git_repo(repo_path, db_path)
git2net.disambiguate_aliases_db(db_path)

net, _, _ = git2net.get_coediting_network(db_path)
net

Somewhere in between, the timestamp is being set to float.
Sometimes pandas does that with integer columns. My guess is that that might be the issue.
Just in case, I'm using pandas==1.3.4

Meanwhile, I will use my "quick and dirty" solution of casting the value to int if it is float.
Thanks!

gotec commented

Hi Lisette,

I think I've figured out why I couldn't replicate this issue. I was using the current version of git2net from github rather than the version on PyPI. I was wrongly assuming they were identical, however, it appears that I have already fixed the issue there.

Before I submit a new version of git2net to PyPI could you confirm the following to me:

  1. Uninstall git2net from your machine (pip uninstall git2net)
  2. Clone the git repository from git2net into a local folder (git clone https://github.com/gotec/git2net)
  3. Navigate to the folder where you cloned git2net to (cd git2net)
  4. Install git2net from the local folder (pip install -e .)
  5. Now run your code from yesterday again:
import git2net
import os 

local_directory = '../datasets/github/git2net/'
repo_path = os.path.join(local_directory,'osm-python-tools')
db_path = os.path.join(local_directory,'osm-python-tools.db')

git2net.mine_git_repo(repo_path, db_path)
git2net.disambiguate_aliases_db(db_path)

net, _, _ = git2net.get_coediting_network(db_path)
net

Please let me know if this works. In that case I will commit a new version with the fixes to PyPI later today.

Cheers,
Christoph

Hi,
Yes, that solved the problem. I run the same code and the timestamp error didn't show up!
Thanks!

This is the output:
image

gotec commented

resolved in git2net 1.5.3