beckerbenj/eatGADS

Function for recoding all values within a certain range as missing

nickhaf opened this issue · 11 comments

First idea for this function in BT21 data preparation

Could also delete metadata of missings not in the data

Idea for this function would be to automate typical processes that are necessary in missing preparation for the data set. Could be (to some part) a wrapper for other eatGADS functions like checkMissingValLabels(), checkMissings() ... Function prepare_missings() could include:

  • check for missings (within a predefined range, e.g. -50 to -100) without label (checkMissingValLabels)
  • set Metadata of values in a certain range to missing
  • set Metadata of values with specified character strings to missing (checkMissings())
  • remove Missing Labels, if the missings doesn't appear in the variable

@beckerbenj do you think this would be a sensible addition to the package?

Sounds reasonable! Probably such a function would wrap at least checkMissingValLables(), checkEmptyValLabels(), removeValLabels() and checkMissings()?

prepare_missings <- function(my_gads){
  # Check for Missings without label
  no_label <- checkMissingValLabels(my_gads, valueRange = c(-50, -100))

  for(i in names(no_label)){
    if(length(no_label[[i]] > 0 )){
       print(paste0("No Label: ", i))
    }
  }
  
  # Missings updaten
my_gads <-  checkMissings(my_gads, missingLabel =
                  "missing - by intention|missing - invalid response|missing - coding impossible|missing - not reached|missing - by design|missing - nicht kalkulierbar|Auslandsabschluss|kann ich nicht beurteilen"
                )

# Ungenutzte Missing-Labels entfernen
empty_labels <- checkEmptyValLabels(my_gads, output = "list")

for(i in names(empty_labels)){
  empty_values <- empty_labels[[i]]$value
  
  for(j in empty_values){
    if(as.numeric(j) < -50){
    my_gads <- removeValLabels(my_gads, varName = i, value = j)
    #print(paste0("Removing some Labels:", i))
    }
  }
}

# Format anpassen
my_gads <- suppressMessages(checkFormat(my_gads, changeFormat = TRUE))


return(my_gads)

}

lfb_allg <- prepare_missings(lfb_allg)

@nickhaf :
We should discuss whether this is still relevant (maybe ask BT team for their opinion).

If I understand your commit correctly, it dealt with this issue by adding checkMissingsByValues(), correct? Is something missing, that we wanted to implement initially?
And sorry for the delay.

No problem.
If I understand your initial proposal correctly, it suggests a wrapper function for checkMissings() and checkMissingsByValues(), which should also remove Missing Labels, if the missings doesn't appear in the variable. The last aspect is not yet implemented in eatGADS, but I am also not sure whether this is a frequent requirement in BT applications.

From my perspective checkMissings() and checkMissingsByValues() are sufficient but I am open to other suggestions.

If i remember correctly, it is often the case that there are labels for -97, -96, -95 ..., even if the values are not actually part of the variable. I think we removed them manually in this case, so a function dealing with that might be a nice to have. But I'm also fine with not implementing it for now.

Can @enkeflor or @liebelta comment on this? If indeed superfluous missing tags should be removed (which I would not have guessed), such functionality would be desirable.

I think removing labels from metadata that have no data basis makes sense. Otherwise, discrepancies can occur during control (or even later during evaluation). Even if we are only talking about the missing labels here, they still have a meaning in terms of content. The workaround as described above (we use partly different approaches here) is ok. However, a uniform function would still be very good. However, it does not have priority.

Desired solution (as discussed with @enkeflor):
Implement new function removeEmptyValLabels() which wraps checkEmptyValLabels() and removeValLabels() as suggested in the above post by @nickhaf