ropensci/nomisr

nomis_get_data error - Can't combine `RECORD_COUNT` <double> and `RECORD_COUNT` <character>

JoannaWatson opened this issue · 6 comments

Hi, I'm having trouble using some code that appeared to work fine a few months ago but is now throwing up the error in the title.

The error occurs when trying to extract claimant count data, code as follows:

Claimant_count <- nomis_get_data(id = "NM_162_1", 
                                 #time = timesel,
                                 time = "latest",
                                 geography = "TYPE432", #district/UA as of April 2021
                                 measures= 20100,
                                 tidy = TRUE,
                                 select = c("DATE", "DATE_NAME", "GEOGRAPHY_NAME", "GEOGRAPHY_CODE", "GENDER_NAME", "MEASURE_NAME", "OBS_VALUE", "RECORD_COUNT"))

It seems to affect any variable that is of type and I wondered if there had been a change to the data type in the underlying nomis data set?

I'll look into this. Could be an underlying change in the data, could be from rate limiting kicking in if the query is larger than before.

I have the same issue with the claimant count dataset (though my initial error was for DATE_TYPECODE) with code that worked fine last month. I now also get a warning that I am trying to access more than 375000 rows, which requires a manual interaction with RStudio, whereas there were no breaks beforehand.

I wrote to Nomis initially who responded that there have been no changes to the dataset, but that they "are aware of a problem with API downloads using the data.json output format". I am quite new to R so I have not been able to figure out whether Nomisr uses data.json or something else, but I hope this may be useful information to you.

Thanks @ammar-gla. What was the query that prompted this warning? nomis_get_data queries for CSVs, not data.json files, but it could be a problem with both, or the Nomis API relying on data.json data for csv formatted queries.

I broke down @JoannaWatson's query and it returns some empty pages, which is what causes the"Can't combine RECORD_COUNT <double> and RECORD_COUNT <character>" error. However the query shouldn't be retrieving empty pages, and the result is 218 rows, which if correct is far too small to be causing pagination issues. I'll keep investigating.

I used the query below. Though the error curiously does not happen if I only use one of the geographies instead of several, the data ends up being incomplete as it only retrieves some of the dates between Dec-2019 and the latest date, whereas it previously retrieved all dates.

Group <- c(2013265927,2092957697,1811939540,1811939541,1811939542,1811939543,1811939544,1811939526,1811939527,1811939545,1811939546,1811939547,
           1811939548,1811939528,1811939529,1811939530,1811939549,1811939550,1811939551,1811939552,1811939531,1811939532,1811939553,1811939533,
           1811939534,1811939554,1811939535,1811939555,1811939556,1811939536,1811939557,1811939537,1811939558,1811939538,1811939539)

#Retrieve claimant count data and receive error
claimant_count_stats <- nomis_get_data(
  id = "NM_162_1", 
  geography = Group,
  time = c("2019-12","latest")) 

Hi @evanodell, just to follow up on this, I tried the same code this morning and it now seems to work perfectly fine. I am not sure whether the issue was on my end (though I changed nothing from last week) or at Nomis. Thanks for looking into it!

Hi @evanodell, I've just tried to run my previous code and it is now working so I think Nomis must have changed something Thank you so much for looking into this for me.