Non-ASCII characters lead to NA geocoding records
dvmasterov opened this issue · 3 comments
dvmasterov commented
Maybe this is obvious, but I am seeing some strange behavior with non-ASCII characters on Mac OS. Here's my version info:
> packageVersion("censusxy")
[1] ‘1.0.0’
> getRversion()
[1] ‘4.0.2’
library(censusxy)
library(dplyr)
library(stringi)
library(sf)
> # this works (as does the web ui)
> g<-cxy_single('412 45th Strèet','Oakland','CA','94609', return = 'geographies', vintage = 'Current_Current')
> summary(as.factor(g$cxy_status))
integer(0)
>
>
> # this breaks
> my_df <- data.frame(street= '412 45th Strèet', city = 'Oakland', state='CA', zip ='94609')
> geocoded_data <- cxy_geocode(my_df,
+ street = "street",
+ city = "city",
+ state = "state",
+ zip = "zip",
+ output = "full",
+ class = "dataframe",
+ return="geographies",
+ vintage ='Current_Current')
>
> summary(as.factor(geocoded_data$cxy_status))
NA's
1
>
> my_df <- my_df %>%
+ mutate_if(is.character,
+ stri_trans_general,
+ id = "latin-ascii")
>
> # this breaks
> geocoded_data <- cxy_geocode(my_df,
+ street = "street",
+ city = "city",
+ state = "state",
+ zip = "zip",
+ output = "full",
+ class = "dataframe",
+ return="geographies",
+ vintage ='Current_Current')
>
> summary(as.factor(geocoded_data$cxy_status))
Match
1
chris-prener commented
This is an issue with the Census Bureau side, and not the package itself. My guess is they can't handle non-ASCII characters. If you can transform them as you've done, I would recommend doing that.
dvmasterov commented
I think a warning or at least a mention in the manual of these requirements would be a nice add. This is not well-documented on the census side (as far as I found). I agree that this is on their end, but it would be nice to spare future users this headache.
chris-prener commented
Yeah, we can make note of it somehow... do you have some "real world" examples of streets with non ASCII characters I could use?