mountainMath/canpumf

label_pumf_data() doesnt replace codes (06) with values "Job leavers, dissatisfied " for April 2022

Closed this issue · 4 comments

to reproduce:


lfs_pumfs <- list_available_lfs_pumf_versions()

lfs_paths <- 
  map_chr(lfs_pumfs$version, ~download_lfs_pumf(.x, destination_dir=file.path(pumf_base_path,"pumf","LFS"))
  )

lfs_dfs <-  map(
  lfs_paths,
  function(x){
    x %>%
      read_pumf_data() %>%
      label_pumf_data()  %>%
      janitor::clean_names()
  }
)


#lfs_dfs[[5]]  is April 2022 :
> lfs_dfs[[5]] %>% count(reason_for_leaving_job_during_previous_year_whyleftn)
# A tibble: 15 × 2
   reason_for_leaving_job_during_previous_year_whyleftn         n
   <chr>                                                    <int>
 1 00                                                         228
 2 01                                                         368
 3 02                                                          88
 4 03                                                          84
 5 04                                                         150
 6 05                                                        1474
7 06                                                         426
 8 07                                                        1053
 9 08                                                          68
10 09                                                        1361
11 Job losers, business conditions (employee)                 546
12 Job losers, company moved or out of business (employee)     99
13 Job losers, dismissal or other reasons                     331
14 Job losers, end of temporary or casual (employee)         1056
15 Not applicable                                          104376



#lfs_dfs[[6]]  is March 2022 :
> lfs_dfs[[6]] %>%   count(reason_for_leaving_job_during_previous_year_whyleftn)
# A tibble: 15 × 2
   reason_for_leaving_job_during_previous_year_whyleftn           n
   <fct>                                                      <int>
 1 Job leavers, other reasons                                   236
 2 Job leavers, own illness or disability                       314
 3 Job leavers, caring for children                             106
 4 Job leavers, pregnancy                                        78
 5 Job leavers, personal or family responsibilities             164
 6 Job leavers, going to school                                1408
 **7 Job leavers, dissatisfied                                    367**
 8 Job leavers, retired                                        1000
 9 Job leavers, business sold or closed down (self-employed)     65
10 Job losers, end of seasonal job (employee)                  1623
11 Job losers, end of temporary or casual (employee)           1072
12 Job losers, company moved or out of business (employee)       83
13 Job losers, business conditions (employee)                   584
14 Job losers, dismissal or other reasons                       334
15 Not applicable                                            100533

I have to double-check, pretty sure that's an issue with the metadata being inconsistent with the PUMF data, where for that month the codes are zero-padded in the pumf data but not in the metadata.

Best way to fix this is for StatCan to fix their metadata or make sure that things are consistent. I am a bit hesitant to add a manual fix for this, that might break other things.

Anyway, will investigate and ping StatCan if needed.

Definitely a problem with the metadata. The April 2022 metadata does not zero pad the labels. Punting this to StatCan to fix.

Response from StatCan:

We appreciate this discrepancy being brought to our attention. Since April, we have corrected the metadata files, which are the same for each month. Therefore, we would recommend using another months metadata files (from May-September 2022) to read the April file.

Reads like they won't fix the metadata. I am quite reluctant to add manual fixes like this to the package, that just asks for trouble down the road.

Closing this since all the old LFS have been updated by StatCan.