thecodecrate/city-state

data is not in predictable format

Closed this issue · 2 comments

I've noticed an issue with the cities data. It's not in a predictable format. I've been writing a helper to deal with abbreviations but then I noticed that some cities in the datafile have the word 'saint' spelled out, others have 'st' and others have 'st.'

Here are a few examples from: CS.cities(:MO, 'US')

"Saint Albans",
"Saint Ann",
"Saint Clair",
"Saint Elizabeth",
"Saint James",
"Saint Mary",
"Saint Robert",
"Saint Thomas",
"St Louis",
"St. Charles",
"St. Genevieve",
"St. Joseph"

Hi Matt,

city-state gets the data from MaxMind, without any processing. CS it's only a MaxMind wrapper. So, this is a MaxMind issue. But as a solution, I propose to make some "translation" mechanism with some pre-defined rules (like Rail's inflections or i18n mechanism). For instance, if in Rails, it would use a "config/initializers/city-state.rb" with a "inflections" like structure to do the normalizations, or if in a simple ruby, it can use some internal file with some rules like "cs.normalize /St[ .]/i, 'Saint' ". What do you think about it?

👍 I think this is a great addition considering the data that maxmind is returning.