is API access with JWT tokens working yet?
Closed this issue · 11 comments
Hey acbuyan,
I have been silent for a while as our project got interrupted for a few months, but I am now back at working on automized download of the BioCollect data for a specific project, including all fields available as well as the media (photos of the Koala).
First I generated a list with all field names (n=760) to use these rather than the "basic" setting for downloading atlas occurences of my research UID.
After receiving error "Expecting value: line 1 column 1 (char0)" a JSON decode error in the request call in atlas_occurrences.py, I added a line to print the raw server response in that python module.
That showed me that querying all 760 fields produced an URL too long to handle (caused an error 414).
I therefore wrote a loop around this to download the fields in chunks and stitch the data back together. This however then caused a new error (now status: in Queue), error thrown because of line:
l 311: return pd.read_csv(zipfile.ZipFile(io.BytesIO(zipURL.content)).open('data.csv'),low_memory=False)
apparently there was a pandas data error "No columns to parse from file".
With this I also received an automatic email from ALA Support that the "Occurrence download failed - data". I did reply to that but haven't heard back yet.
When I read through the ALA API doc, I found that the API I (likely) need to access all data occurrences of that project may be protected?: /occurrences/{recordUuid} (retrieve full record details).
I am unsure if this is the correct API though.
Yet it seems like the JWT token access to protected API is included but not yet functional in galah-python? Judging from the galah_config.py script and the generate_jwt_token.py script?
I have already applied and received a ClientID and Secret and would love to see if this allows me to get the required data of the project...
Yet you have done such incredible work with this package already, that I would rather try and inquire how far JWT tokens are implemented here yet, and if this is the way to solve my access and download issues reaching far beyond the "basic" field list.
I am not a software developer or have experience with APIs, I am more a data analyst. In this case however I can't reach the data I need to analyse, haha, so if there is any way I could try to help with this issue/further development, please let me know.
Hope you can point me in the correct direction, happy to give more specific project details if needed.
Thanks again!
Jojo
Hi Jojo,
Good to hear from you, and great to hear that you continue to use galah-python
! It makes my day :)
First things first: there have been a couple of updates to galah-python
; have you updated to the latest version? I don't think this will solve your issue, but I have fixed some bugs along the way and added other options too.
As far as your question on JWT tokens, we’re holding off on JWT token integration until we deploy Cognito. That should be soon but we’re not sure. In addition, JWT tokens are only for access to sensitive data, and only then to access that you have personally been approved to receive. If these conditions do not apply, then JWT tokens are not what's causing your problem.
Unfortunately, galah-python
doesn’t support BioCollect APIs, so you can only use galah to get occurrences once they have been passed to biocache.
I may have asked you this before, but what is the reason that you are downloading all fields? It is unfortunately never a good idea to download all of the ALA fields, as they include spatial data that cannot be relevant to every application. It also includes Darwin Core fields that are rarely populated and may not be useful to you. If you know what data fields you need, you can then include it as an argument to atlas_occurrences
Hi again Jojo,
I get what you mean about the headings. If you're not overly familiar with the concept of Darwin Core (https://dwc.tdwg.org/) headings, it can feel overwhelming.
Out of curiosity, have you tried the search_all
function in galah
? It could help you narrow down your fields search? You can use show_all
to show all the terms, and search_all
to find terms with keywords either in the term (id) itself or the description. It looks like this:
id description type link
0 _nest_parent_ NaN field NaN
1 _nest_path_ NaN field NaN
2 _root_ NaN field NaN
3 abcdTypeStatus ABCD field in use by herbaria field NaN
4 acceptedNameUsage http://rs.tdwg.org/dwc/terms/acceptedNameUsage field NaN
.. ... ... ... ...
755 multimediaLicence Media filter field media
756 images Media filter field media
757 videos Media filter field media
758 sounds Media filter field media
759 qid Reference to pre-generated query other
[760 rows x 4 columns]
>>> galah.search_all(fields="latitude")
id description type link
0 decimalLatitude The decimal latitude associated with this reco... field https://github.com/AtlasOfLivingAustralia/ala-...
1 verbatimLatitude http://rs.tdwg.org/dwc/terms/verbatimLatitude field NaN
2 raw_decimalLatitude The decimal latitude as supplied by the data p... field NaN
As far as getting the data, I don't think it was a problem with what you were doing, I think it was with how galah-python
was constructing the URLs. I've patched it and pushed my changes to the Python Package Index, so all you should need to do is update your installation and it should work.
yes - do you mind double-checking what version of galah-python
you have? It should be 0.8.3
. If it is that version and it is still not working, then I'll have to go back to the drawing board.
Ok, this might sound weird, but ... try putting decimalLatitude
at the beginning of your list of terms and see if that works
That works!
Interesting fix :)
What did that change?
I don't know - I'll ask the systems team and see why.
Thank you!
Hi Jojo,
Wanted to let you know that it is on the ALA side of things. I'm not sure when a fix is coming, so for now keep using the decimalLatitude
fix. I'm going to close the issue for now, but do let me know if anything else comes up!