inbo/camtrapdp

`build_taxonomy()` should check for `scientificNames` that occur more than once

PietrH opened this issue · 2 comments

If a scientificName occurs more than once, provide warning, use the first one

unique() from base.

If scientificName is repeated in x$taxonomic, it'll result in rows in observations being repeated after the left_join in taxonomy()

If the second list member, the duplicated record by scientificName has fields that are not present in the first record, should they be dropped also?

for example:

list(list(
  scientificName = "Vulpes vulpes", taxonID = "https://www.checklistbank.org/dataset/COL2023/taxon/5BSG3",
  taxonRank = "species", vernacularNames = list(
    eng = "red fox",
    nld = "vos"
  )
), list(
  scientificName = "Vulpes vulpes",
  taxonID = "https://www.wikidata.org/wiki/Q8332", taxonRank = "species",
  vernacularNames = list(eng = "red fox", lbe = "Цулчӏа")
))

The second for record has an extra language, as I understand it now, this second language will not be an extra column in the output:

scientificName taxonID taxonRank vernacularNames.eng vernacularNames.nld
Vulpes vulpes https://www.checklistbank.org/dataset/COL2023/taxon/5BSG3 species red fox vos

I want to avoid having to merge multiple x$taxonomic list members, so I don't really want to add the extra language.

Agreed, no merging, just pick the first e.g. with distinct(x, keep_all = TRUE).