Error when downloading hdx population density maps

Question

Error when downloading hdx population density maps

Opened this issue 2 years ago · 7 comments

Hi 👋🏻 hoping for some help on an error I'm running into when using urbanpy

Using code from the urbanpy_workshop.ipynb when I adapt the line below for Brazil:

pop_brazil = up.download.hdx_fb_population('brazil', 'full')

I get the error

ValueError                                Traceback (most recent call last)
Cell In [30], line 1
----> 1 pop_brazil = up.download.hdx_fb_population('brazil', 'full')

File /opt/homebrew/lib/python3.10/site-packages/urbanpy/download/download.py:182, in hdx_fb_population(country, map_type)
    180     return pd.concat([pd.read_csv(file) for file in dataset_dict[country][map_type]])
    181 else:
--> 182     return pd.read_csv(dataset_dict[country][map_type])

File /opt/homebrew/lib/python3.10/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File /opt/homebrew/lib/python3.10/site-packages/pandas/util/_decorators.py:317, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    311 if len(args) > num_allow_args:
    312     warnings.warn(
    313         msg.format(arguments=arguments),
    314         FutureWarning,
    315         stacklevel=find_stack_level(inspect.currentframe()),
    316     )
--> 317 return func(*args, **kwargs)

File /opt/homebrew/lib/python3.10/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
...
-> 1744     raise ValueError(msg)
   1746 try:
   1747     return mapping[engine](f, **self.options)

ValueError: Invalid file path or buffer object type: <class 'list'>

When is substitute children or youth instead of full the code runs fine. It's hard to tell from the metadata for this hdx data what the new keyword might be for "full population data". Wondering if you have ideas for how I can get the full population data.

Note this also means the line below from the tutorial does not work:

pop_arg = up.download.hdx_fb_population('argentina', 'full')

But it gives a 404 error.

Answer 1 · 2023-01-04T15:12:17.000Z

Hi there!

It seems that data has been moved around at the source. Since the Humanitarian Data Exchange does not provide an API to access datasets, we have to find the URLs and hard code them. When data is updated or moved, those links no longer work... oh well.

You can still get the data with up.download.hdx_dataset() (there's an example in the same tutorial, look for "To access these data, we will use another function of UrbanPy by performing a manual search in the online repository")

Right now, population estimates for Argentina are here, and full population data is at "arg_general_2020_csv.zip", linked to https://data.humdata.org/dataset/6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip.

So you should be able to download it using:

up.download.hdx_dataset('https://data.humdata.org/dataset/6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip')

Thanks for the heads up!

Answer 2 · 2023-01-13T10:33:33.000Z

Hi @robcrystalornelas, thanks for your issue.

I'm currently working on updating this function in the next version of urbanpy to use on the backend the HDX API so it automatically update the data links.

This problem is caused because some of the population data links we manually set are no longer working. As @bitsandbricks (thanks!) mentioned you can use the up.download.hdx_dataset function to download any csv HDX dataset you need for the moment.

Answer 3 · 2023-01-16T23:36:08.000Z

@bitsandbricks and @Claudio9701 👋🏻 Thanks so much for the additional info. @bitsandbricks that code actually doesn't work for me within my own jupyter notebook. I also tried it with brazil data. @Claudio9701 does

up.download.hdx_dataset('https://data.humdata.org/dataset/6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip')

work for you? I get another 404 error.

Answer 4 · 2023-01-17T10:05:22.000Z

Hello @robcrystalornelas

You're are totally right!. There is a problem with the provided line of code I didn't catch at first. The hdx_dataset function receives the dataset id. For example, we go to the Argentina population dataset on HDX and right click and copy the specific dataset link we want (See figure below).

Argentina: High Resolution Population Density Maps + Demographic Estimates

The data link for the overall population density dataset is this one:

https://data.humdata.org/dataset/6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip

To run our function we would only copy what is after "https://data.humdata.org/dataset/" to the resource argument. This will end as:

arg_pop = up.download.hdx_dataset(resource="6cf49080-1226-4eda-8700-a0093cbdfe4d/resource/5737d87f-e17f-4c82-b1bd-d589ed631318/download/arg_general_2020_csv.zip")

In this notebook you can test the solution and generate a population density map like the one bellow:

To make it work with brasil you have to download the parts of the country that contains the city you want to analyze. Since it is a really big country its divided in 4 regions.

Thanks you so much for being one of the early adopters of urbanpy! I would really ove to have a brief online meeting when you are free!

Answer 5 · 2023-01-19T18:50:57.000Z

Excellent, thanks @Claudio9701!

I updated the code as you suggested and it worked great ✨

And yes, happy to schedule a time to chat about my work with urbanpy so far.

Answer 6 · 2023-02-15T21:42:46.000Z

The problem is just line 179 of file: urbanpy/download/download.py:

`#Brazil is split into 4 maps

if isinstance(type(dataset_dict[country][map_type]), list):
    return pd.concat([pd.read_csv(file) for file in dataset_dict[country][map_type]])
else:
    return pd.read_csv(dataset_dict[country][map_type])`

Must be:
`#Brazil is split into 4 maps

if isinstance(dataset_dict[country][map_type], list):
    return pd.concat([pd.read_csv(file) for file in dataset_dict[country][map_type]])
else:
    return pd.read_csv(dataset_dict[country][map_type])`

Answer 7 · 2023-02-15T22:47:48.000Z

@biodatasciencearg thanks for your comment! 🙏🏽

In master branch this problem is now addressed as you suggested.

... 
if isinstance(ids, list) and len(ids) > 1:
...

I'm working to test all the minor fixes and new functions so it can be published to pypi.