logstash-plugins/logstash-filter-geoip

Need more information extracted from GeoIP database not only lat or lng

keefeleen opened this issue · 8 comments

We have to extract province, city information from MaxMind GeoIP2 database when using logstash filter.

But it seems that logstash default geoip plugin can just provide "latitude" and "longitude" info.
Actually we write a plugin for info extraction, but we strongly recommend official plugin maintainer can update this plugin. Can future version add more fields that can be extracted from GeoIP2 databases?

Thanks in advance and waiting for your reply.

Strange report as the default list of field is already quite exhaustive and contains the city_name : https://github.com/logstash-plugins/logstash-filter-geoip/blob/master/lib/logstash/filters/geoip.rb#L67
Also this is configurable by you to add the desired level of subdivision as documented here http://dev.maxmind.com/geoip/geoip2/geoip2-city-country-csv-databases/

@keefeleen do you have a restricted fields config in your configuration ?

Thanks for your replay, I checked over again that just as you said the default fields already contains the info we need. Then I found the problem that caused failure when we tried to extract city_name information.

Since GeoIP filter uses GeoIP2-java lib. We found that before reading the database, the DatabaseReader class will compare database's metadata named "database_type" with the data type name we want (eg. "city"), if "database_type" doesn't contain the data type name, it will throw an exception and won't give the IP data we want.

I think the related code is as follows in DatabaseReader.java:

    String databaseType = this.getMetadata().getDatabaseType();
    if (!databaseType.contains(type)) {
        String caller = Thread.currentThread().getStackTrace()[2]
                .getMethodName();
        throw new UnsupportedOperationException(
                "Invalid attempt to open a " + databaseType
                        + " database using the " + caller + " method");
    }

Actually we want to know the exact purpose of adding this checking logic, and we also need to know the way to avoid failure when we do not want to use "city" or "country" in "database_type" while reading "city" or "country" information.

Thanks in advance.

@wiibaa hello, can you take a look at the question above?

Since another team in our company build our own city level GeoIP database which follows MaxMind's standard and provide it for us to use. But the "database_type" the defined doesn't contain "city" in it so we cannot use logstash GeoIP plugin with it.

Is there any way to avoid asking them to rebuild the database and make logstash GeoIP plugin compatible with the existing database?

@keefeleen I'm sorry but this is how MaxMind DatabaseReader seems to work, logstash is simply calling this method

 @Override
public CityResponse city(InetAddress ipAddress) throws IOException,
        GeoIp2Exception {
    return this.get(ipAddress, CityResponse.class, "City");
}

So your custom database must define the proper type otherwise Maxmind lib cannot use it, this is an issue of compatiblity with your database with maxmind lib, logstash cannot help much

The logstash plugin does treat lat-lon specially. Even if the database has the right entries for e.g. city or country, the record is thrown out by logstash if it doesn't have a lat-lon.

geoip.rb:

    if location.getLatitude().nil? && location.getLongitude().nil?
      return
    end

This causes problems even with the official MaxMind databases. The GeoIP2-City-Europe DB, for example, has continent/country codes but no location fields for places outside Europe.

@joewreschnig so you mean this assumption is wrong ?

  # if location is empty, there is no point populating geo data
  # and most likely all other fields are empty as well

Could you provide some examples, I cannot find easily on MaxMind documentation the description of such cases

Yes, I believe that assumption is wrong, even for official MaxMind DBs. For example, when I look up a US IP in the GeoIP2-City-Europe DB, I get only the country, no location:

  # mmdblookup -f /var/lib/GeoIP/GeoIP2-City-Europe.mmdb -i 8.8.8.8
  {
    "continent": 
      {
        "code": 
          "NA" <utf8_string>
        "geoname_id": 
          6255149 <uint32>
        "names": 
          {
            "de": 
              "Nordamerika" <utf8_string>
            "en": 
              "North America" <utf8_string>
            "es": 
              "Norteamérica" <utf8_string>
            "fr": 
              "Amérique du Nord" <utf8_string>
            "ja": 
              "北アメリカ" <utf8_string>
            "pt-BR": 
              "América do Norte" <utf8_string>
            "ru": 
              "Северная Америка" <utf8_string>
            "zh-CN": 
              "北美洲" <utf8_string>
          }
      }
    "country": 
      {
        "geoname_id": 
          6252001 <uint32>
        "iso_code": 
          "US" <utf8_string>
        "names": 
          {
            "de": 
              "USA" <utf8_string>
            "en": 
              "United States" <utf8_string>
            "es": 
              "Estados Unidos" <utf8_string>
            "fr": 
              "États-Unis" <utf8_string>
            "ja": 
              "アメリカ合衆国" <utf8_string>
            "pt-BR": 
              "Estados Unidos" <utf8_string>
            "ru": 
              "США" <utf8_string>
            "zh-CN": 
              "美国" <utf8_string>
          }
      }
    "registered_country": 
      {
        "geoname_id": 
          6252001 <uint32>
        "iso_code": 
          "US" <utf8_string>
        "names": 
          {
            "de": 
              "USA" <utf8_string>
            "en": 
              "United States" <utf8_string>
            "es": 
              "Estados Unidos" <utf8_string>
            "fr": 
              "États-Unis" <utf8_string>
            "ja": 
              "アメリカ合衆国" <utf8_string>
            "pt-BR": 
              "Estados Unidos" <utf8_string>
            "ru": 
              "США" <utf8_string>
            "zh-CN": 
              "美国" <utf8_string>
          }
      }
  }

If I remove the check (and put appropriate guards around the assignments) the plugin handles the data just fine - I get a continent and country.

@joewreschnig very interesting, it's true that the history of the geoip filter was mainly to retrieve the lat/lon and use it with a map widget in kibana, but that should not be the only use case.