AaronWard/covidify

Could not pull from https://github.com/CSSEGISandData/COVID-19.git

Closed this issue · 8 comments

It has stopped working (I just ran it again). Cannot pull from the REPO and no data in covidify-output - I/O error > something has gone wrong with the path or data permissions?

DATA SOURCE: git

Data Extraction

git pull from https://github.com/CSSEGISandData/COVID-19.git
Could not pull from https://github.com/CSSEGISandData/COVID-19.git

Data Exploration

Importing Data...
Traceback (most recent call last):
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/covidify/data_exploration.py", line 45, in
agg_df = pd.read_parquet(os.path.join(data_dir, agg_file))
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pandas/io/parquet.py", line 310, in read_parquet
return impl.read(path, columns=columns, **kwargs)
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pandas/io/parquet.py", line 124, in read
result = self.api.parquet.read_table(
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pyarrow/parquet.py", line 1271, in read_table
pf = ParquetDataset(source, metadata=metadata, memory_map=memory_map,
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pyarrow/parquet.py", line 1028, in init
self.metadata_path) = _make_manifest(
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pyarrow/parquet.py", line 1228, in _make_manifest
raise IOError('Passed non-file path: {0}'
OSError: Passed non-file path: /Users/chriswarner/Desktop/covidify-output/data/2020-03-16/agg_data_2020-03-16.parquet.gzip

  • Has it worked before and not working now?
  • What OS are you using?

could you try cloning the from that COVID-19 repo directly to make sure its not a problem with your git installation?

I'm running covidify daily with source=git with no problems. Maybe there was a temporary problem with pulling the source git repo?

Just quick update - sorry for delay - it works again - I just tested, nothing has changed - must have been issues with external repo? Thanks again. Chris.

Just quick update - sorry for delay - it works again - I just tested, nothing has changed - must have been issues with external repo? Thanks again. Chris.

same error but still not work, why?

I assume it has something to do with this

Github has a rate limit of 5000 request per repo per hour. So my assumption is that because when the datasource repo gains popularity, then the bandwidth is being used up.

Maybe a way around this might be to run your extraction at the start of the hour when the rates have been reset.

I was having similar errors:

###
### Data Extraction
###
git pull from https://github.com/CSSEGISandData/COVID-19.git
Could not pull from https://github.com/CSSEGISandData/COVID-19.git

I found my `/tmp/corona/COVID-19' repo had become corrupted. Deleted that and it worked again.