pyinat/pyinaturalist

make_tree: tree with omitted ranks has out of order children

Closed this issue · 2 comments

synrg commented

The problem

When making a taxon tree from a life list that has some ranks omitted, i.e. include_ranks does not have all intermediate ranks from the root down to the leaves of the tree being produced from it, then children are still grouped and sorted by the omitted ranks, making them appear out of order.

Expected behavior

The expected order for my example below is:

Family Andrenidae
└── Genus Andrena
    ├── Andrena clarkella
    ├── Andrena crataegi
    ├── Andrena dunningi
    ├── Andrena frigida
    ├── Andrena milwaukeensis
    ├── Andrena nubecula
    └── Andrena wilkella

Steps to reproduce the behavior

>>> from pyinat import iNatClient, make_tree, pprint_tree
>>> client = iNatClient()
>>> life_list = client.observations.life_list(user_id=545640, taxon_id=57669)
>>> tree = make_tree(life_list.data, include_ranks=['family','genus','species'])
>>> pprint_tree(tree)
Family Andrenidae
└── Genus Andrena
    ├── Andrena clarkella
    ├── Andrena frigida
    ├── Andrena milwaukeensis
    ├── Andrena nubecula
    ├── Andrena dunningi
    ├── Andrena crataegi
    └── Andrena wilkella
>>> tree.flatten()
[
    TaxonCount(id=57668, iconic_taxon_name='Unknown', is_active=True, name='Andrenidae', parent_id=630955, rank_level=30, rank='family', descendant_obs_count=111),
    TaxonCount(id=57669, iconic_taxon_name='Unknown', is_active=True, name='Andrena', parent_id=958234, rank_level=20, rank='genus', count=49, descendant_obs_count=111),
    TaxonCount(id=198998, iconic_taxon_name='Unknown', is_active=True, name='Andrena clarkella', parent_id=571358, rank_level=10, rank='species', count=7, descendant_obs_count=7),
    TaxonCount(id=198991, iconic_taxon_name='Unknown', is_active=True, name='Andrena frigida', parent_id=571358, rank_level=10, rank='species', count=1, descendant_obs_count=1),
    TaxonCount(id=198981, iconic_taxon_name='Unknown', is_active=True, name='Andrena milwaukeensis', parent_id=571358, rank_level=10, rank='species', count=7, descendant_obs_count=7),
    TaxonCount(id=198973, iconic_taxon_name='Unknown', is_active=True, name='Andrena nubecula', parent_id=571188, rank_level=10, rank='species', count=5, descendant_obs_count=5),
    TaxonCount(id=198997, iconic_taxon_name='Unknown', is_active=True, name='Andrena dunningi', parent_id=571409, rank_level=10, rank='species', count=2, descendant_obs_count=2),
    TaxonCount(id=199011, iconic_taxon_name='Unknown', is_active=True, name='Andrena crataegi', parent_id=571426, rank_level=10, rank='species', count=5, descendant_obs_count=5),
    TaxonCount(id=127785, iconic_taxon_name='Unknown', is_active=True, name='Andrena wilkella', parent_id=571443, rank_level=10, rank='species', count=12, descendant_obs_count=12)
]
>>> 

The problem arises from the subgenera under Andrena not being included. The order of the children is correct with respect to each subgenus, but as those aren't present in the resulting tree, the children are out of order so far as the user can see.

Workarounds

Traverse the tree to build groups with children sorted in the desired alphabetic order, but this is at the expense of losing the convenience offered by pprint_tree() and flatten() which handle the tree traversal transparently.

Environment

  • OS & version: Debian 10
  • Python version: 3.11
  • Pyinaturalist version or branch: main
JWCook commented

Thanks for the detailed bug report! I will get that fixed soon.

JWCook commented

Fixed!