AtlasOfLivingAustralia/galah-python

status always skipped when using galah.atlas_occurences()

JojoReikun opened this issue · 5 comments

Hey,

I know this python package is still under development and quite new and I find it awesome that you're working on developing a galah python version!!! I am currently working on a python project using ALA data and would love to use this package, instead of having to use an R package to download the data and then python to handle it.

An issue I am currently facing when I try to download data is that it always returns the error message that the json response status is "skipped" and I have tried a lot of options of taxa to search for, filters etc.

I have configured galah to the Australian Atlas and put in my email that I used for ALA registration. The data_profile is set to "ALA".

The current code I am trying:
df_counts_project = galah.atlas_occurrences(taxa="Phascolarctos cinereus", use_data_profile=True)

The errror message:
Traceback (most recent call last):
File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch_main_.py", line 11, in
main()
File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch_main_.py", line 7, in main
download_ala_data()
File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch\operations\galah_data_download.py", line 48, in download_ala_data
df_counts_project = galah.atlas_occurrences(taxa="Phascolarctos cinereus", use_data_profile=True)
File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\galah\atlas_occurrences.py", line 243, in atlas_occurrences
if response.json()['status'] == "skipped":
KeyError: 'status'

Any feedback on this would be appreciated :)
Thanks!

Hey Jojo,

Thank you for taking the time to comment! As the one who's written the lion's share of the package, it's great to hear there are users out there that are excited by this package!

Hm, when I've used it, it gives me ~203,000 occurrence records. To try and help you, I have a couple of questions and the code I used:

>>> galah.galah_config(atlas="Australia", email = "amanda.buyan@csiro.au")
>>> galah.atlas_occurrences(taxa="Phascolarctos cinereus", use_data_profile=True)

are you configuring galah like in the above code, or slightly differently? Is it just atlas_occurrences() that isn't working?

Hey,

thanks for getting back to me so quickly! Okay, it's good to hear that it works for you! Must be something on my end then!

I have configured as following:
>>> galah.galah_config(atlas="Australia", email="schjojoultz@gmail.com", data_profile="ALA")
>>> df_counts_project = galah.atlas_occurrences(taxa="Phascolarctos cinereus", use_data_profile=True)

I actually fixed it!

On ALA it says you can log in with an existing google/facebook account to avoid the sign up process. But that doesn't seem to be enough to use that email for the galah_config statement!

I have resigned up using a different email, and then also received the confirmation link. Using that email now, I can successfully search the occurences!

While I'm at that: What filter word would I have to use to specify a specific database?

@JojoReikun here's some code

>>> galah.search_all(fields="data")
>>> galah.show_values(field="datasetID")
>>> galah.atlas_counts(filters="datasetID=SU")

There is a datasetName field; however, you currently can't display the values from that one (I'm currently fixing it and it will be in the next release).

Yep, I found the datasetName field before, which is what I have tried.

The workaround you have posted throws out a few errors itself. I am trying to track the cause down myself by looking through the source codes, but maybe you have an idea yourself, so posting the messages here:

I am currently only running, as I need to find the datasetID first:
galah.search_all(fields="data")
galah.show_values(field="datasetID", verbose=True)

Error:
Traceback (most recent call last):
File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 969, in _finalize_columns_and_data
columns = _validate_or_indexify_columns(contents, columns)
File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 1017, in _validate_or_indexify_columns
raise AssertionError(
AssertionError: 2 columns passed, passed data had 4 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch_main_.py", line 11, in
main()
File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch_main_.py", line 7, in main
download_ala_data()
File "D:\Jojo\DDC\KoalaWatchDashboard\KoalaDashboardScript\KoalaWatch\operations\galah_data_download.py", line 50, in download_ala_data
galah.show_values(field="datasetID", verbose=True)
File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\galah\show_values.py", line 94, in show_values
tempdf = pd.DataFrame([entry['i18nCode'].split('.')],columns=['field','category'])
File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\frame.py", line 746, in init
arrays, columns, index = nested_data_to_arrays(
File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 510, in nested_data_to_arrays
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 875, in to_arrays
content, columns = _finalize_columns_and_data(arr, columns, dtype)
File "C:\Users\JojoS\Miniconda3\envs\KoalaDashboardScript\lib\site-packages\pandas\core\internals\construction.py", line 972, in _finalize_columns_and_data
raise ValueError(err) from err
ValueError: 2 columns passed, passed data had 4 columns