mountainMath/cansim

Make normalize_cansim_values(factors=TRUE) the default

Closed this issue · 6 comments

I always call normalize_cansim_values(factors=TRUE) and many others do the same. I suspect that those that don’t do this only do so because they don’t know about this.

To not break too much backward compatibility I suggest to store the normalized “VALUE” column as “Value” and add the Date column by default. This should not break anything. Moreover I propose to convert the variable columns to factors by default too. That may break some workflows in rare circumstances.

We can keep the normalize_cansim_values as is and replace VALUE with Value if called. If it is called with factors=FALSE, the default, we cast the columns back to character.

Thanks for proposing this enhancement, but I have a question. Are you suggesting

  • that (factors = TRUE) be the default for the normalize_cansim_values() function, or
  • adding normalization to the get_cansim() function?

My suggestions will depend on which way your thinking is going.

My suggestion is that the get_cansim() output would in the future look exactly like what
get_cansim() %>% normalize_cansim_values(factors=TRUE, replacement_value = "Value")
produces. But if normalize_cansim_values is explicitly called, then the result would be identical to current behaviour.

This ensures backward compatibility of existing code if normalize_cansim_values is called explicitly.

If normalize_cansim_values is not called explicitly, there would be an additional "Value" column with the normalized values, and the member columns would be factors instead of characters. This may break backward compatibility in some cases, for example when joining other data that have a column named "Value" or if operations rely on member columns being characters instead of factors. However, I think it would be rare that those issues arise.

Got it, thanks.

I like the idea of adding an additional variable with the normalized values, but will suggest that it have a name that adds further differentiation. I'm a regular user of janitor::clean_names() on my {cansim} downloads, because it efficiently deals with all of the other issues that crop up with all of the other variable names (spaces, characters, etc). The clean_names() function will change both "VALUE" and "Value" to "value".

So I would like to suggest that the new column, with normalized values, be called something like "value_normalized" so that it is not simply case-sensitive.

That's a good point. I would probably prefer a shorter name, mostly because otherwise filter or mutate lines get quite long and unwieldily. Maybe val? That might also work well as a uniform column name across both the French and English tables.

How about 'val_norm', so it retains a hint as to why it's not strictly "VALUE".

Still works in French, too 8-)