/clickhouse-maxmind-geoip

A demonstration how to use ClickHouse with MaxMind GeoIP2 databases for geolocaiton

Primary LanguageDockerfile

Example of ClickHouse integration with MaxMind GeoLite2 database for geolocation.

This project contains:

  • Dictionary definitions for integrating GeoLite2 or GeoIp2 dictionaries into ClickHouse database.
  • Table definitions based on these dictionaries.
  • Query examples of how you can use them with example results.
  • Dockerfile / docker-compose.yml files for starting ClickHouse with the GeoLite2 dictionaries inside for fast experimenting.
  • A workaround to load GeoLite2-City-Locations-en.csv which ClickHouse considers corrupted because of apostrophe symbols.

More on GeoLite2/GeoIp2 dictionaries structure and content can be found here:

After loading dictionaries they have such statistics:

SELECT *
FROM system.dictionaries 
┌─name───────────────────────┬─origin───────────────────────────────────────────────────────────┬─type───┬─key──────┬─attribute.names─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─attribute.types───────────────────────────────────────────────────────────────────────────────────────────────────────┬─bytes_allocated─┬─query_count─┬─hit_rate─┬─element_count─┬─────────load_factor─┬───────creation_time─┬─source─────────────────────────────────────────────────────────────────────────┬─last_exception─┐
│ geoip_country_locations_en │ /etc/clickhouse-server/geoip_country_locations_en_dictionary.xml │ Hashed │ UInt64   │ ['locale_code','continent_code','continent_name','country_iso_code','country_name','is_in_european_union']                                                                                                                                  │ ['String','String','String','String','String','UInt8']                                                                │          160808 │           0 │        1 │           252 │          0.24609375 │ 2019-04-15 12:50:04 │ File: /etc/clickhouse-server/GeoLite2-Country-Locations-en.csv CSVWithNames    │                │
│ geoip_country_blocks_ipv6  │ /etc/clickhouse-server/geoip_country_blocks_ipv6_dictionary.xml  │ Trie   │ (String) │ ['geoname_id','registered_country_geoname_id','represented_country_geoname_id','is_anonymous_proxy','is_satellite_provider']                                                                                                                │ ['UInt32','UInt32','UInt32','UInt8','UInt8']                                                                          │        13738664 │           0 │        1 │         92570 │                   1 │ 2019-04-15 12:50:04 │ File: /etc/clickhouse-server/GeoLite2-Country-Blocks-IPv6.csv CSVWithNames     │                │
│ geoip_asn_blocks_ipv4      │ /etc/clickhouse-server/geoip_asn_blocks_ipv4_dictionary.xml      │ Trie   │ (String) │ ['autonomous_system_number','autonomous_system_organization']                                                                                                                                                                               │ ['UInt32','String']                                                                                                   │        57925936 │           0 │        1 │        428088 │                   1 │ 2019-04-15 12:49:51 │ File: /etc/clickhouse-server/GeoLite2-ASN-Blocks-IPv4.csv CSVWithNames         │                │
│ geoip_city_blocks_ipv6     │ /etc/clickhouse-server/geoip_city_blocks_ipv6_dictionary.xml     │ Trie   │ (String) │ ['geoname_id','registered_country_geoname_id','represented_country_geoname_id','is_anonymous_proxy','is_satellite_provider','postal_code','latitude','longitude','accuracy_radius']                                                         │ ['UInt32','UInt32','UInt32','UInt8','UInt8','String','Float32','Float32','UInt32']                                    │        57222376 │           0 │        1 │        440302 │                   1 │ 2019-04-15 12:50:03 │ File: /etc/clickhouse-server/GeoLite2-City-Blocks-IPv6.csv CSVWithNames        │                │
│ geoip_asn_blocks_ipv6      │ /etc/clickhouse-server/geoip_asn_blocks_ipv6_dictionary.xml      │ Trie   │ (String) │ ['autonomous_system_number','autonomous_system_organization']                                                                                                                                                                               │ ['UInt32','String']                                                                                                   │        11903280 │           0 │        1 │         55741 │                   1 │ 2019-04-15 12:49:51 │ File: /etc/clickhouse-server/GeoLite2-ASN-Blocks-IPv6.csv CSVWithNames         │                │
│ geoip_city_blocks_ipv4     │ /etc/clickhouse-server/geoip_city_blocks_ipv4_dictionary.xml     │ Trie   │ (String) │ ['geoname_id','registered_country_geoname_id','represented_country_geoname_id','is_anonymous_proxy','is_satellite_provider','postal_code','latitude','longitude','accuracy_radius']                                                         │ ['UInt32','UInt32','UInt32','UInt8','UInt8','String','Float32','Float32','UInt32']                                    │       399348968 │           0 │        1 │       3223012 │                   1 │ 2019-04-15 12:50:01 │ File: /etc/clickhouse-server/GeoLite2-City-Blocks-IPv4.csv CSVWithNames        │                │
│ geoip_city_locations_en    │ /etc/clickhouse-server/geoip_city_locations_en_dictionary.xml    │ Hashed │ UInt64   │ ['locale_code','continent_code','continent_name','country_iso_code','country_name','subdivision_1_iso_code','subdivision_1_name','subdivision_2_iso_code','subdivision_2_name','city_name','metro_code','time_zone','is_in_european_union'] │ ['String','String','String','String','String','String','String','String','String','String','UInt32','String','UInt8'] │        87644424 │           0 │        1 │        111302 │ 0.42458343505859375 │ 2019-04-15 12:50:03 │ File: /etc/clickhouse-server/GeoLite2-City-Locations-en-fixed.csv CSVWithNames │                │
│ geoip_country_blocks_ipv4  │ /etc/clickhouse-server/geoip_country_blocks_ipv4_dictionary.xml  │ Trie   │ (String) │ ['geoname_id','registered_country_geoname_id','represented_country_geoname_id','is_anonymous_proxy','is_satellite_provider']                                                                                                                │ ['UInt32','UInt32','UInt32','UInt8','UInt8']                                                                          │        28603048 │           0 │        1 │        330017 │                   1 │ 2019-04-15 12:50:03 │ File: /etc/clickhouse-server/GeoLite2-Country-Blocks-IPv4.csv CSVWithNames     │                │
└────────────────────────────┴──────────────────────────────────────────────────────────────────┴────────┴──────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────┴─────────────┴──────────┴───────────────┴─────────────────────┴─────────────────────┴────────────────────────────────────────────────────────────────────────────────┴────────────────┘

GeoLite2-City-CSV queries

SELECT 
    ip,
    -- geoip_city_blocks_ipv4 dictionary
    dictGetUInt32('geoip_city_blocks_ipv4', 'geoname_id', tuple(IPv4StringToNum(ip))) AS geoname_id, 
    dictGetString('geoip_city_blocks_ipv4', 'postal_code', tuple(IPv4StringToNum(ip))) AS postcode, 
    dictGetFloat32('geoip_city_blocks_ipv4', 'latitude', tuple(IPv4StringToNum(ip))) AS latitude, 
    dictGetFloat32('geoip_city_blocks_ipv4', 'longitude', tuple(IPv4StringToNum(ip))) AS longitude, 
    dictGetUInt32('geoip_city_blocks_ipv4', 'accuracy_radius', tuple(IPv4StringToNum(ip))) AS accuracy_radius,
    -- geoip_city_locations_en dictionary       
    dictGetString('geoip_city_locations_en', 'locale_code', toUInt64(geoname_id)) AS locale_code, 
    dictGetString('geoip_city_locations_en', 'continent_code', toUInt64(geoname_id)) AS continent_code, 
    dictGetString('geoip_city_locations_en', 'continent_name', toUInt64(geoname_id)) AS continent_name, 
    dictGetString('geoip_city_locations_en', 'country_iso_code', toUInt64(geoname_id)) AS country_iso_code, 
    dictGetString('geoip_city_locations_en', 'country_name', toUInt64(geoname_id)) AS country_name, 
    dictGetString('geoip_city_locations_en', 'subdivision_1_iso_code', toUInt64(geoname_id)) AS subdivision_1_iso_code, 
    dictGetString('geoip_city_locations_en', 'subdivision_1_name', toUInt64(geoname_id)) AS subdivision_1_name, 
    dictGetString('geoip_city_locations_en', 'subdivision_2_iso_code', toUInt64(geoname_id)) AS subdivision_2_iso_code, 
    dictGetString('geoip_city_locations_en', 'subdivision_2_name', toUInt64(geoname_id)) AS subdivision_2_name, 
    dictGetString('geoip_city_locations_en', 'city_name', toUInt64(geoname_id)) AS city_name, 
    dictGetUInt32('geoip_city_locations_en', 'metro_code', toUInt64(geoname_id)) AS metro_code, 
    dictGetString('geoip_city_locations_en', 'time_zone', toUInt64(geoname_id)) AS time_zone, 
    dictGetUInt8('geoip_city_locations_en', 'is_in_european_union', toUInt64(geoname_id)) AS is_in_european_union
FROM 
(
    SELECT arrayJoin(['129.45.17.12', '173.194.112.139', '77.88.55.66', '2.28.228.0', '95.47.254.1', '62.35.172.0']) AS ip
) 
┌─ip──────────────┬─geoname_id─┬─postcode─┬─latitude─┬─longitude─┬─accuracy_radius─┬─locale_code─┬─continent_code─┬─continent_name─┬─country_iso_code─┬─country_name───┬─subdivision_1_iso_code─┬─subdivision_1_name─┬─subdivision_2_iso_code─┬─subdivision_2_name─┬─city_name─────────────┬─metro_code─┬─time_zone──────┬─is_in_european_union─┐
│ 129.45.17.12    │    2507480 │ 16100    │  36.7405 │    3.0096 │              10 │ en          │ AF             │ Africa         │ DZ               │ Algeria        │ 16                     │ Algiers            │                        │                    │ Algiers               │            │ Africa/Algiers │ 0                    │
│ 173.194.112.139 │    6252001 │          │   37.751 │   -97.822 │            1000 │ en          │ NA             │ North America  │ US               │ United States  │                        │                    │                        │                    │                       │            │                │ 0                    │
│ 77.88.55.66     │    2017370 │          │  55.7386 │   37.6068 │            1000 │ en          │ EU             │ Europe         │ RU               │ Russia         │                        │                    │                        │                    │                       │            │                │ 0                    │
│ 2.28.228.0      │    2640910 │ EH35     │   55.913 │   -2.9398 │               5 │ en          │ EU             │ Europe         │ GB               │ United Kingdom │ SCT                    │ Scotland           │ ELN                    │ East Lothian       │ Ormiston              │            │ Europe/London  │ 1                    │
│ 95.47.254.1     │    3077311 │          │  50.0848 │   14.4112 │             100 │ en          │ EU             │ Europe         │ CZ               │ Czechia        │                        │                    │                        │                    │                       │            │ Europe/Prague  │ 1                    │
│ 62.35.172.0     │    2983987 │ 53110    │  48.4833 │   -0.4833 │             100 │ en          │ EU             │ Europe         │ FR               │ France         │ PDL                    │ Pays de la Loire   │ 53                     │ Mayenne            │ Rennes-en-Grenouilles │            │ Europe/Paris   │ 1                    │
└─────────────────┴────────────┴──────────┴──────────┴───────────┴─────────────────┴─────────────┴────────────────┴────────────────┴──────────────────┴────────────────┴────────────────────────┴────────────────────┴────────────────────────┴────────────────────┴───────────────────────┴────────────┴────────────────┴──────────────────────┘

GeoLite2-Country-CSV queries

SELECT 
    ip, 
    -- geoip_country_blocks_ipv4 dictionary
    dictGetUInt32('geoip_country_blocks_ipv4', 'geoname_id', tuple(IPv4StringToNum(ip))) AS geoname_id,
    -- geoip_country_locations_en dictionary
    dictGetString('geoip_country_locations_en', 'locale_code', toUInt64(geoname_id)) AS locale_code, 
    dictGetString('geoip_country_locations_en', 'continent_code', toUInt64(geoname_id)) AS continent_code, 
    dictGetString('geoip_country_locations_en', 'continent_name', toUInt64(geoname_id)) AS continent_name, 
    dictGetString('geoip_country_locations_en', 'country_iso_code', toUInt64(geoname_id)) AS country_iso_code, 
    dictGetString('geoip_country_locations_en', 'country_name', toUInt64(geoname_id)) AS country_name, 
    dictGetUInt8('geoip_country_locations_en', 'is_in_european_union', toUInt64(geoname_id)) AS is_in_european_union
FROM 
(
    SELECT arrayJoin(['129.45.17.12', '173.194.112.139', '77.88.55.66', '2.28.228.0', '95.47.254.1', '62.35.172.0']) AS ip
) 
┌─ip──────────────┬─geoname_id─┬─locale_code─┬─continent_code─┬─continent_name─┬─country_iso_code─┬─country_name───┬─is_in_european_union─┐
│ 129.45.17.12    │    2589581 │ en          │ AF             │ Africa         │ DZ               │ Algeria        │ 0                    │
│ 173.194.112.139 │    6252001 │ en          │ NA             │ North America  │ US               │ United States  │ 0                    │
│ 77.88.55.66     │    2017370 │ en          │ EU             │ Europe         │ RU               │ Russia         │ 0                    │
│ 2.28.228.0      │    2635167 │ en          │ EU             │ Europe         │ GB               │ United Kingdom │ 1                    │
│ 95.47.254.1     │    3077311 │ en          │ EU             │ Europe         │ CZ               │ Czechia        │ 1                    │
│ 62.35.172.0     │    3017382 │ en          │ EU             │ Europe         │ FR               │ France         │ 1                    │
└─────────────────┴────────────┴─────────────┴────────────────┴────────────────┴──────────────────┴────────────────┴──────────────────────┘

GeoLite2-ASN-CSV queries

SELECT
    ip,
    -- geoip_asn_blocks_ipv4 dictionary
    dictGetUInt32('geoip_asn_blocks_ipv4', 'autonomous_system_number', tuple(IPv4StringToNum(ip))) AS autonomous_system_number, 
    dictGetString('geoip_asn_blocks_ipv4', 'autonomous_system_organization', tuple(IPv4StringToNum(ip))) AS autonomous_system_organization 
FROM 
(
    SELECT arrayJoin(['129.45.17.12', '173.194.112.139', '77.88.55.66', '2.28.228.0', '95.47.254.1', '62.35.172.0']) AS ip
) 
┌─ip──────────────┬─autonomous_system_number─┬─autonomous_system_organization─┐
│ 129.45.17.12    │                   327931 │ Optimum-Telecom-Algeria        │
│ 173.194.112.139 │                    15169 │ Google LLC                     │
│ 77.88.55.66     │                    13238 │ YANDEX LLC                     │
│ 2.28.228.0      │                    12576 │ EE Limited                     │
│ 95.47.254.1     │                    47552 │ Vezet-Kirov Ltd.               │
│ 62.35.172.0     │                     5410 │ Bouygues Telecom SA            │
└─────────────────┴──────────────────────────┴────────────────────────────────┘

Note on IPv6

  • Use dictionaries postfixed with ..._ipv6 instead of ..._ipv4
  • Use IPv6StringToNum() instead of IPv4StringToNum()

An example:

SELECT
    ip,
    dictGetString('geoip_asn_blocks_ipv6', 'autonomous_system_organization', tuple(toFixedString(ifNull(IPv6StringToNum(ip), ''), 16))) AS autonomous_system_organization,
    dictGetFloat32('geoip_city_blocks_ipv6', 'latitude', tuple(toFixedString(ifNull(IPv6StringToNum(ip), ''), 16))) AS latitude,
    dictGetFloat32('geoip_city_blocks_ipv6', 'longitude', tuple(toFixedString(ifNull(IPv6StringToNum(ip), ''), 16))) AS longitude
FROM
(
    SELECT arrayJoin(['2001:4860:4860::8888', '2a02:6b8::feed:bad']) AS ip
)
┌─ip───────────────────┬─autonomous_system_organization─┬─latitude─┬─longitude─┐
│ 2001:4860:4860::8888 │ Google LLC                     │   37.751 │   -97.822 │
│ 2a02:6b8::feed:bad   │ YANDEX LLC                     │  55.7527 │   37.6172 │
└──────────────────────┴────────────────────────────────┴──────────┴───────────┘