pfmc-assessments/nwfscSurvey

Add common name check in the pull code

chantelwetzel-noaa opened this issue · 5 comments

It is often unclear to users how common species names should be formatted to pull and return data of interest. Creating a function that correct incorrect capitalization (Dover sole vs. dover sole vs Dover Sole) would make pulling data easier for users.

In the past, we've discussed exposing a table of names (common, scientific) in the Data Warehouse as an additional metadata table. That likely won't solve every issue (e.g., users not looking at the metadata) so perhaps integrating some fuzzy logic would also help.

@kellijohnson-NOAA has integrated some code into the package creating a table of species information in the GetSpp.fn function. My thought is to apply the information created by this function in an internal check for function common name/scientific name input by the user where if a user specifies:

PullCatch.fn(Name = "dover sole", SurveyName = "NWFSC.Combo")

the function would internally correct the Name input to "Dover sole". Should be fairly simple based on existing functionality for species that are commonly used by the NWFSC.

Ahh right, I had forgotten about the GetSpp.fn. Sounds like a good plan.

@chantelwetzel-noaa do you want to just to tolower() and grep() as a check to make sure that the common name exists and then when a match is found assign the match rather than the user input? I can easily create this if you want.

Yes. This is something that has been on my to-do list. Is that approach you are considering is to use tolower() to match a column in the saved csv files with all observed species names to ensure a correct name is passed? The only minor issue that I have encountered when using a similar approach on other projects is for species that have joint species names (e.g., vermilion/sunset rockfish) that can return multiple matches. However, I think this can probably be easily dealt with inside a function.