ropensci/nomisr

Suppression rules in nomisr v nomis web

ninarobery opened this issue · 6 comments

Are there different supression rules that are being applied in nomisr compared to nomis web? For the following indicator: % Unemployed with health conditions or illnesses lasting more than 12 months (aged 16+) there are some numerators missing for areas but in nomis web they are available.

For example for Bury Jan-Dec 2018
Nomis web has the following information:
Numerator 1,200
Denominator 48,900
Percentage 2.5

Using the following R code in nomisr:
library(dplyr)
library(nomisr)

Bury_data <- nomis_get_data(id = "NM_17_5", date = "previousMINUS2", geography = "1946157082", variable = 1716, measures = c(20599, 21001, 21002)) %>%
select(DATE_NAME, GEOGRAPHY, GEOGRAPHY_NAME, GEOGRAPHY_CODE, VARIABLE_NAME, MEASURES_NAME, OBS_VALUE)

Numerator NA
Denominator 48,900
Percentage NA

Given that in nomisr the data is showing for the following associated indicators:
% In employment with health conditions or illnesses lasting more than 12 months (aged 16+)
% Inactive with health conditions or illnesses lasting more than 12 months (aged 16+)

It would mean you could calcuate the missing numerator and percentage for the unemployed

If you could help clarify this, it would be really helpful - thank you.

nomisr calls the Nomis web api, which I assume uses the same rules. Could you check that it isn't the same problem as #18 and that you're using the same measure in API calls as on the website?

Hi @ninarobery - I've just taken a look. It looks like you need to include the confidence interval measure into your query for the variable and numerator records to have an OBS_VALUE other than NA:

This works:

library(nomisr)
library(dplyr)
Bury_data <- nomis_get_data(id = "NM_17_5", 
                            date = "previousMINUS2", 
                            geography = 1946157082, 
                            variable = 1716, 
                            measures = c(20599, 21001, 21002, 21003)) %>%
        select(DATE_NAME, GEOGRAPHY, GEOGRAPHY_NAME, GEOGRAPHY_CODE, VARIABLE_NAME, MEASURES_NAME, OBS_VALUE)

This returns NA values in the OBS_VALUE field:

library(nomisr)
library(dplyr)
Bury_data <- nomis_get_data(id = "NM_17_5", 
                            date = "previousMINUS2", 
                            geography = 1946157082, 
                            variable = 1716, 
                            measures = c(20599, 21001, 21002)) %>%
        select(DATE_NAME, GEOGRAPHY, GEOGRAPHY_NAME, GEOGRAPHY_CODE, VARIABLE_NAME, MEASURES_NAME, OBS_VALUE)

I don't know much about the package or the API, but is this expected behaviour @evanodell ?

That's not expected behaviour but it is what you get when passing requests directly to the API through other methods and in other formats - i.e. the behaviour stems from the API itself, rather than the R code. You can also leave the measures parameter as NULL, which will return all measures as default. E.g.

Bury_data <- nomis_get_data(id = "NM_17_5", 
                            date = "previousMINUS2", 
                            geography = 1946157082, 
                            variable = 1716) %>%
filter(MEASURES != 21003) %>% 
        select(DATE_NAME, GEOGRAPHY, GEOGRAPHY_NAME, GEOGRAPHY_CODE, 
VARIABLE_NAME, MEASURES_NAME, OBS_VALUE)

I suspect it is to do with how the server processes the data, and that it requires the confidence interval to determine suppression requirements (although oddly it is not reported in this case). I will get in touch with the Nomis team and see what they have to say.

I've received the following from the Nomis team:

Thank you for your email, this issue happens because of the way we flag values on that dataset... and hopefully in the coming months it will be fixed as we are implementing new flagging on this dataset.

The problem is caused because traditionally we used the confidence column to flag that the reliability of the value on outputs. Under the new model, we should be flagging the actual variable as unreliable status.

Thank you @evanodell