
Would like to confirm my understanding of the taxonomy directory structure

momonala opened this issue · 1 comments

Regarding the directory naming convention, I want to confirm that is correct:

The first directory is the organism class, then the name of the subdirectory is some permutation of the following:

-- genus -- species -- subspecies --

where each taxa category is separated by a space. Can you confirm? Thanks!

Hi momonala,

For the 2017 dataset, the top level directory (containing 13 folders) is the same top level grouping used by iNat e.g. This is an iNat creation not a biologically meaningful grouping. Then inside of each directory are the folders of species data (with some small percentage being genus or sub-species). As pointed out the name of each directory is genus - species - subspecies.

For the, just released, 2018 dataset the full taxonomy is provided and the classes only come from species. However, the English names of the different taxonomic levels have been obfuscated for the competition.

Hope that helps.