/geonames2sqlite

put geonames text dump tables into a SQLite database, including allCountries.txt, feature code list, admin lists, countryInfo, and hierarchy table

Primary LanguageShell

geonames2sqlite

Make a single SQLite database from the following GeoNames text dump tables:

  • allCountries.txt
  • admin1CodesASCII.txt
  • admin2Codes.txt
  • featureCodes_en.txt
  • hierarchy.txt
  • countryInfo.txt

How to Use

  • First run get_txt_dump.sh to download all the needed text files from GeoNames:
./get_txt_dump.sh
  • Then run geonames2sqlite.sh to populate a new SQLite database with these tables:
./geonames2sqlite.sh input/ output/geonames_$(date +%F).sqlite

Example Use

Look up comma separated adm codes and names no lower than the current feature, along with the geonameid, name, lat, lng, modification date, etc:

geoname_id place_name latitude longitude location_type_code location_type_name geoname_adm_code geonames_adm_name geonames_retrieval_time
1125426 Shighnan District 37.61667 71.45 ADM2 second-order administrative division AF,01,1125426 Afghanistan,Badakhshan,Shighnan District 2013-04-27T00:00:00+0000

Example SQLite flavored SQL to do so:

SELECT
    a.geonameid AS geoname_id,
    a.name AS place_name,
    a.latitude,
    a.longitude,
    f.code AS location_type_code,
    f.name AS location_type_name,
    CASE 
        WHEN f.code = 'ADM1' THEN group_concat( a.countrycode || ',' || adm1.adm1_code ) 
        WHEN f.code = 'ADM2' THEN group_concat( a.countrycode || ',' || adm1.adm1_code || ',' || adm2.adm2_code )
        ELSE group_concat( a.countrycode || ',' || a.admin1code || ',' || a.admin2code || ',' || a.admin3code || ',' || a.admin4code )
    END AS geoname_adm_code,
    CASE
        WHEN f.code = 'ADM1' THEN group_concat( cc.Country || ',' || adm1.adm1_name )
        WHEN f.code = 'ADM2' THEN group_concat( cc.Country || ',' || adm1.adm1_name || ',' || adm2.adm2_name ) 
        ELSE 
		CASE
			WHEN adm1.adm1_name IS NULL AND adm2.adm2_name IS NULL THEN cc.Country
			WHEN adm1.adm1_name IS NOT NULL AND adm2.adm2_name IS NULL THEN group_concat( cc.Country || ',' || adm1.adm1_name )
			WHEN adm1.adm1_name IS NOT NULL AND adm2.adm2_name IS NOT NULL THEN group_concat( cc.Country || ',' || adm1.adm1_name || ',' || adm2.adm2_name )
			ELSE cc.Country
		END
    END AS geonames_adm_name,
    modificationdate || 'T00:00:00+0000' AS geonames_retrieval_time
    FROM
        allCountries AS a
    LEFT JOIN featurecodes_en AS f 
        ON a.featurecode = f.code
    LEFT JOIN admin1codesascii AS adm1 
        ON adm1.adm0_code = a.countrycode AND adm1.adm1_code = a.admin1code
    LEFT JOIN admin2codes AS adm2 
        ON adm2.adm0_code = a.countrycode AND adm2.adm1_code = a.admin1code AND adm2.adm2_code = a.admin2code
    LEFT JOIN countryInfo AS cc 
        ON cc.ISO = a.countrycode
    WHERE
        a.geonameid =  '1125426'
    GROUP BY a.geonameid
;

Nota Bene

As of 2014-10-01 countryInfo.txt has an incorrect number of fields for Åland Islands due to a double tab. SQLite will also interpret the last - and entirely empty - field in countryInfo.txt as an error. The script assumes these oddities are still the case and handles them accordingly, though to ignores Åland Islands.

Note that while allCountries.geonameid is used as a foreign key for most tables, we really ought to break up allCountries into more tables for feature codes/classes and admin codes/names in order to use the appropriate foreign keys there.

TODO

SpatiaLite? Unclear if this would be helpful if not paired with other geometries like admin boundaries and roads.