Issues geocoding large datasets
plnnr opened this issue · 5 comments
I am trying to geocode several thousand records (about 75,000 addresses or more). I get error messages, and they are very difficult to debug. For example:
Error in eval_tidy(enquo(var), var_env) : object 'V2' not found
When the error occurs, the geocoding fails and so does the variable assignment. In other words, the hour or two it spent geocoding was useless because I can't retain the results, nor can I determine which records might be giving the geocoder issues.
Any thoughts on this issue and how to resolve it?
Thanks for opening an issue @plnnr ,
All that I can speculate is that your data returns a non-standard response from the Census bureau's API at some point. This specific error occurs because the response does not have a column containing the original address.
Would it be possible to provide a sample of your data?
Alternatively, if you are able to split the data yourself and send smaller batches, you could ensure that progress isn't lost also while helping locate the problem data.
I ended up splitting the data, and there were no errors strangely (even though the subsets contained all of the records as the more massive pull). I am unable to provide a sample because it's sensitive/personal data (voter registration records). Although splitting worked, it doesn't make sense why.
Curious, @plnnr. What sized chunks did you split the data into?
@chris-prener My data had ZIP codes, so I split on that. Each ZIP code had 17k to 37k records.
Gotcha. Yeah, this seems like some kind of non-standard API response, perhaps due to a timeout and perhaps due to the volume of records you're looking at. Either way, I'm glad it worked with splitting via zip code! I'm going to go ahead and close this now since the package itself appears to be performing as expected.