[BUG] Male records are retained when filtering `type_comp == 'Onbekend'` in countEmbryos()
mvarewyck opened this issue · 2 comments
Describe the bug
When filtering the data within countEmbryos()
there is a bug retaining some records that have type_comp == 'Onbekend'
and geslacht_comp == "Mannelijk"
. Issue only occurs for the records with unknown type as we filter on the female types otherwise.
To Reproduce
library(reportingGrofwild)
ecoData <- loadRawData(type = "eco")
plotData <- countEmbryos(data = ecoData[ecoData$wildsoort == "Damhert", ], type = "Onbekend")$data
sum(plotData$Freq)
# [1] 19
filterData <- ecoData[ecoData$type_comp == "Onbekend" & ecoData$wildsoort == "Damhert", ]
nrow(filterData)
# [1] 19
table(filterData$geslacht_comp)
#
# Vrouwelijk Mannelijk Onbekend
# 8 1 10
Expected behavior
Exclude the male species within countEmbryos()
Git SHA (after 0.3.1)
#7568c97e249da29bc34f3581c2c549d45a14777f
@SanderDevisscher How do we exclude the males? The question is mostly about records which have unknown type_comp
. For the other types we automatically select the females.
(1) retain records with geslacht_comp != "mannelijk"
-> there can still be records retained that have gender unkown and are actually males. So we might have too many records with type_comp onbekend in the countEmbryos plot
ecoData <- loadRawData(type = "eco")
allSpecies <- unique(ecoData$wildsoort)
sapply(allSpecies, function(iSpecies) {
filterData <- ecoData[ecoData$type_comp == "Onbekend" &
ecoData$wildsoort == iSpecies &
ecoData$geslacht_comp != "Mannelijk", ]
table(filterData$geslacht_comp)
})
# Wild zwijn Edelhert Damhert Ree
# Vrouwelijk 28 0 8 133
# Mannelijk 0 0 0 0
# Onbekend 615 1 10 873
(2) retain records with geslacht_comp == "vrouwelijk"
-> we exclude way too many records, because there are many records with unknown gender that still have known type
> table(droplevels(ecoData$type_comp[ecoData$geslacht_comp == "Onbekend"]))
Smalree Jaarlingbok Reegeit Reebok Onbekend
10 6 166 58 1499
(3) exclude records with geslacht_comp == "mannelijk" OR (geslacht_comp == "unknown" & type_comp == "unknown"
. We might have excluded some female records. so too little records with type_comp unknown in the countEmbryos plot
So I think the decision is between (1) and (3) depending on whether you want to retain or exclude the ones for which you don't know gender AND type. Or do I miss sth?
I would go for the 3rd option. Explicit male individuals and fully unknown (no sex & no type) should be excluded.
Option 2 indicates we need to add some logic to check whether these are in fact correct and ifso reverse engineer the sex based on the type in the Backoffice.