Ansa.GeoNames
This ASP.NET Core 2.0 console application will populate a data store with Gazetteer data from the GeoNames geographical database. These data can already be accessed via a public API or text-file data dumps but sometimes it is useful to be able to write SQL queries to a relational database. Ansa.GeoNames wraps the excellent NGeoNames library, which handles the heavy-lifting in terms of retrieving and parsing the GeoNames data, and will generate your choice of a Sqlite or SQL Server database.
- Status
- Schema
- Installation and setup
- Configuration
- Building a Sqlite database
- Building a SQL Server database
Status
I wrote this application to address a need common to a number of my customers. It is not, as yet, comprehensive for all other use cases. The output data set includes tables for (1) GeoNames, (2) alternate Names, and (3) country info. There are not yet tables for feature codes, admin codes, time zones and the other reference lists included in the GeoNames database. I feel the provided tables do solve the most common needs of users, but I am interested in producing a fully-featured version of the application, particularly if other developers are interested.
Schema
The database schema mirrors the GeoNames Gazetteer data structure described here.
Intallation and setup
First, clone the repository to your local machine:
git clone https://github.com/rodoch/Ansa.GeoNames
Then, build the ASP.NET solution, specifying your target runtime environment, e.g.:
dotnet build -r win10-x64
Tip: See a list of target runtime identifiers here.
This will produce an .exe file and an appsettings.json file that you can grab from <PATH-TO-YOUR-APP>/bin/Debug/netcoreapp2.0/
.
Or else, you can just run the application from within Visual Studio.
Configuration
Use the appsettings.json file to configure your target database:
{
"Database": "Sqlite",
"ConnectionString": "Data Source=D:\\GeoNamesOutput\\Geonames.sqlite",
"DataSourcePath": "D:\\GeoNamesData",
"GeoNames": {
"AlternateNamesLanguageCodes": "ga,en",
"CitiesMinimumPopulation": "15000"
}
}
Key | Description | Values |
---|---|---|
Database | Specifies the type of database you wish to build | Sqlite OR SQLServer |
ConnectionString | DB connection string | Note that a connection string for a Sqlite DB must include the Data Source= prefix |
DataSourcePath | Where you intend to store the raw text files from the GeoNames data store, prior to DB input. | Directory path |
GeoNames:AlternateNamesLanguageCodes | Specify the languages for which you wish to have alternate toponymic names | Comma-separated string of ISO Alpha-2 language codes. An empty value means all languages will be retrieved. |
GeoNames:CitiesMinimumPopulation | The minimum population a population centre must have to be included in the GeoNames dataset | 1000 OR 5000 OR 15000 |
Things to note:
- You will not need to retrieve the GeoNames raw data dumps yourself. The application will do this for you. You need to specify the DataSourcePath in order to decide where these files will be stored.
- Some GeoNames raw text files range from 0.5-1.5GB unzipped, depending on the configuration you choose. Ensure you have adequate disk space.
- Probably the biggest factor in download size, processing speed, and the size of the subsequent database is the number of alternate languages you specify.
- The next biggest factor is the minimum population setting. Higher minima mean smaller file sizes.
Unsurprisingly, the SQL Server implementation writes data to the DB significantly faster. If creating a Sqlite database you might want to make yourself a cup of tea.
Building a Sqlite database
Ansa.GeoNames will create the database and all the necessary tables automatically. You just need to run the program with the appropriate configuration, e.g.
{
"Database": "Sqlite",
"ConnectionString": "Data Source=D:\\GeoNamesOutput\\GeoNames.sqlite",
"DataSourcePath": "D:\\GeoNamesData",
"GeoNames": {
"AlternateNamesLanguageCodes": "ga,en",
"CitiesMinimumPopulation": "15000"
}
}
Remember to:
- Specify "Sqlite" as your
Database
value. - Specify a connection string, including the
Data Source=
prefix.
Building a SQL Server database
- Create a new database on your SQL Server instance and give it an appropriate name, e.g. GeoNames. Set the recovery model and compatibility according to your own requirements.
Latin1_General_CI_AI
collation seems to be the best choice for this type of database. - Run the provided SQL script in
/DBScripts/SqlServer/
to generate the tables. - Specify your configuration in appsettings.json, e.g.:
{
"Database": "SQLServer",
"ConnectionString": "Server=localhost;Database=GeoNames;Trusted_Connection=True;",
"DataSourcePath": "D:\\GeoNamesData",
"GeoNames": {
"AlternateNamesLanguageCodes": "ga,en",
"CitiesMinimumPopulation": "15000"
}
}
Permissions
Please note that the database user specified in your SQL Server connection string must have ALTER
permission on the database you are writing to. This is required so that IDENTITY_INSERT
can be turned on for the duration of the data input process as we are inserting premade IDs in to the PK/Identity columns of the GeoNames and AlternateNames tables.