label_pumf_data() doesnt replace codes (06) with values "Job leavers, dissatisfied " for April 2022
Closed this issue · 4 comments
to reproduce:
lfs_pumfs <- list_available_lfs_pumf_versions()
lfs_paths <-
map_chr(lfs_pumfs$version, ~download_lfs_pumf(.x, destination_dir=file.path(pumf_base_path,"pumf","LFS"))
)
lfs_dfs <- map(
lfs_paths,
function(x){
x %>%
read_pumf_data() %>%
label_pumf_data() %>%
janitor::clean_names()
}
)
#lfs_dfs[[5]] is April 2022 :
> lfs_dfs[[5]] %>% count(reason_for_leaving_job_during_previous_year_whyleftn)
# A tibble: 15 × 2
reason_for_leaving_job_during_previous_year_whyleftn n
<chr> <int>
1 00 228
2 01 368
3 02 88
4 03 84
5 04 150
6 05 1474
7 06 426
8 07 1053
9 08 68
10 09 1361
11 Job losers, business conditions (employee) 546
12 Job losers, company moved or out of business (employee) 99
13 Job losers, dismissal or other reasons 331
14 Job losers, end of temporary or casual (employee) 1056
15 Not applicable 104376
#lfs_dfs[[6]] is March 2022 :
> lfs_dfs[[6]] %>% count(reason_for_leaving_job_during_previous_year_whyleftn)
# A tibble: 15 × 2
reason_for_leaving_job_during_previous_year_whyleftn n
<fct> <int>
1 Job leavers, other reasons 236
2 Job leavers, own illness or disability 314
3 Job leavers, caring for children 106
4 Job leavers, pregnancy 78
5 Job leavers, personal or family responsibilities 164
6 Job leavers, going to school 1408
**7 Job leavers, dissatisfied 367**
8 Job leavers, retired 1000
9 Job leavers, business sold or closed down (self-employed) 65
10 Job losers, end of seasonal job (employee) 1623
11 Job losers, end of temporary or casual (employee) 1072
12 Job losers, company moved or out of business (employee) 83
13 Job losers, business conditions (employee) 584
14 Job losers, dismissal or other reasons 334
15 Not applicable 100533
I have to double-check, pretty sure that's an issue with the metadata being inconsistent with the PUMF data, where for that month the codes are zero-padded in the pumf data but not in the metadata.
Best way to fix this is for StatCan to fix their metadata or make sure that things are consistent. I am a bit hesitant to add a manual fix for this, that might break other things.
Anyway, will investigate and ping StatCan if needed.
Definitely a problem with the metadata. The April 2022 metadata does not zero pad the labels. Punting this to StatCan to fix.
Response from StatCan:
We appreciate this discrepancy being brought to our attention. Since April, we have corrected the metadata files, which are the same for each month. Therefore, we would recommend using another months metadata files (from May-September 2022) to read the April file.
Reads like they won't fix the metadata. I am quite reluctant to add manual fixes like this to the package, that just asks for trouble down the road.
Closing this since all the old LFS have been updated by StatCan.