Steps
-
Download iNaturalist OpenData Metadata from https://github.com/inaturalist/inaturalist-open-data
- instructions are in Metadata/Download]
- use commands of the form
aws s3 cp s3://inaturalist-open-data/photos.csv.gz photos.csv.gz
- The photos.csv.gz file takes a while to download
-
Open metadata files
-
Get Eucalyptus taxon, and all child taxa.
-
Join with observations table on taxon_id.
-
Then, join with photo table on observation_uuid. Get photo_id and extension for each photo in a df
-
Download 1000 photo (by using
subprocess.run
and theaws s3 cp
applied to the df from previous step). I just let it run and stopped after ~1000, might be better to randomly sample 1000 indices -
Download list of all taxa in the GlobalTreeSearch database (DOI: 10.13140/RG.2.2.33593.90725)
- https://tools.bgci.org/global_tree_search.php
- had to remove the two extra pieces of info at the top two rows (citation and doi) so that it was able to be processed by pandas
-
get taxon_id's for all 60k tree taxa
-
remove taxon_ids that are in the eucalyptus list from the tree list to get the other tree category