
Diagnostic: Missing names on record

Closed this issue · 6 comments

This record shows as incertae sedis but the lookup should find the species.

I'll investigate, cc @mdoering

The lookup cache contains:

hbase(main):001:0> scan 'name_usage_kv', { FILTER => "RowFilter(=, 'substring:Lissotarsus reticulata')" }
ROW                                                       COLUMN+CELL                                                                                                                                                             
 6|||||||||Lissotarsus reticulata Chaudoir, 1842|||||     column=v:j, timestamp=1696041217943, value={"synonym":true,"usage":{"key":9355155,"name":"Lissotarsus reticulatus Chaudoir, 1842","rank":"SPECIES"},"acceptedUsage":{"ke
                                                          y":7811407,"name":"Platyderus reticulatus (Chaudoir, 1842)","rank":"SPECIES"},"classification":[{"key":1,"name":"Animalia","rank":"KINGDOM"},{"key":54,"name":"Arthropod
                                                          260555,"name":"Platyderus","rank":"GENUS"},{"key":7811407,"name":"Platyderus reticulatus","rank":"SPECIES"}],"diagnostics":{"matchType":"FUZZY","confidence":99,"status"
                                                          :"SYNONYM","lineage":[],"alternatives":[]},"iucnRedListCategory":{"category":"NOT_EVALUATED","code":"NE","scientificName":"Lissotarsus reticulatus Chaudoir, 1842","taxo
                                                          nomicStatus":"SYNONYM","acceptedName":"Platyderus reticulatus (Chaudoir, 1842)"},"issues":[]}                                                                           
1 row(s) in 45.7350 seconds

Formatted for readability:

Date is Saturday, September 30, 2023 2:33:37.943 AM

    "name":"Lissotarsus reticulatus Chaudoir, 1842",
    "name":"Platyderus reticulatus (Chaudoir, 1842)",
      "name":"Platyderus reticulatus",
    "scientificName":"Lissotarsus reticulatus Chaudoir, 1842",
    "acceptedName":"Platyderus reticulatus (Chaudoir, 1842)"

The lookup appears to have worked, and been cached as expected but wasn't included in the interpreted record. Reprocessing yields the same result.

With @muttcg help, we have diagnosed this, and it's behaving as intended @mdoering

It's dropping into this line

      if (usageMatch == null || isEmpty(usageMatch) || checkFuzzy(usageMatch, identification)) {
        // "NO_MATCHING_RESULTS". This
        // happens when we get an empty response from the WS
        addIssue(tr, TAXON_MATCH_NONE);

The web service is returning a fuzzy match (reticulata vs reticulatus) and as we described in this issue if there are no higher taxa on the record (there aren't in this case) we don't assume a fuzzy match is correct as it made too many mistakes. This record needs a higher taxon added to match.

I don't think we want to change this behavior - agree?

As it happens, this is a narrowly scoped dataset (titled "Coleoptera...") so we could add a default of kingdom = Animalia in the registry which would at least improve this dataset.

Ah, that makes sense. It would be great to understand why that has happened from a user perspective, but yes we should keep it. And for sure add a default classification to the dataset. I see this is done already.

We could add more, but I'll start conservatively

Animalia was enough for this example. but there were records being interpreted as Fungi as well, so I added Animalia / Arthropoda / Insecta and that has put this into a better shape.