ip2location/ip2location-go

New library

pg9182 opened this issue · 10 comments

I've written a new library for querying IP2Location and IP2Proxy databases, github.com/pg9182/ip2x. I'm currently refactoring the code generation and refining the library interface, but it's more or less feature-complete.

  • It supports Go 1.18+.
  • It supports querying using Go 1.18's new net/netip.Addr type, which is much more efficient than parsing the IP from a string every time.
  • It uses native integer types instead of big.Int, which is also much more efficient.
  • It's about 11x faster than this library when querying a single field, and 2x faster for all fields, while making a fraction of the number of allocations (2 for init, 1 for each lookup, plus 1 for each typed field get, or 2 for an untyped one).
  • It has comprehensive built-in documentation, including automatically-generated information about which fields are available in different product types.
  • It supports querying information about the database itself, for example, whether it supports IPv6, and which fields are available.
  • It has a more fluent and flexible API (e.g., record.Get(ip2x.Latitude), record.GetString(ip2x.Latitude), record.GetFloat(ip2x.Latitude))
  • It has built-in support for pretty-printing records as strings or JSON.
  • It supports both IP2Location databases in a single package with a unified API.
  • It uses code generation to simplify adding new products/types/fields/documentation while reducing the likelihood of bugs (input, docs).
  • It's written in idiomatic Go: correct error handling (rather than stuffing error strings into the record struct), useful zero values (an empty record will work properly), proper type names, etc.
  • There are tests to ensure the output is consistent with this library, that a range of IPv4 (and their possible IPv6-mappings) address work correctly, and other things. There are also fuzz tests to ensure IPs can't crash the library and are IPv4/v6-mapped correctly.

This library is already being used in production at Northstar for game server geolocation and log analysis.


$ cd test && go test -bench=. -benchmem .
db: IP2Location DB11 2022-10-29 [city,country_code,country_name,latitude,longitude,region,time_zone,zip_code] (IPv4+IPv6)
goos: linux
goarch: amd64
pkg: github.com/pg9182/ip2x/test
cpu: AMD Ryzen 5 5600G with Radeon Graphics         
BenchmarkIP2x_Init-12                       	17850333	        67.91 ns/op	     128 B/op	       2 allocs/op
BenchmarkIP2x_LookupOnly-12                 	18722506	        61.36 ns/op	      48 B/op	       1 allocs/op
BenchmarkIP2x_GetAll-12                     	 1522696	       812.2 ns/op	    1688 B/op	      14 allocs/op
BenchmarkIP2x_GetOneString-12               	 7839385	       144.1 ns/op	     304 B/op	       2 allocs/op
BenchmarkIP2x_GetOneFloat-12                	14312419	        84.16 ns/op	      48 B/op	       1 allocs/op
BenchmarkIP2x_GetTwoString-12               	 4243560	       244.9 ns/op	     560 B/op	       3 allocs/op
BenchmarkIP2x_GetTwoFloat-12                	12198259	       101.1 ns/op	      48 B/op	       1 allocs/op
BenchmarkIP2x_GetNonexistent-12             	14834245	        79.85 ns/op	      48 B/op	       1 allocs/op
BenchmarkIP2LocationV9_Init-12              	  602967	      2191 ns/op	     400 B/op	       7 allocs/op
BenchmarkIP2LocationV9_LookupOnly-12        	 1473849	       782.6 ns/op	     672 B/op	      24 allocs/op
BenchmarkIP2LocationV9_GetAll-12            	  819900	      1324 ns/op	    2268 B/op	      36 allocs/op
BenchmarkIP2LocationV9_GetOneString-12      	 1346534	       889.2 ns/op	     936 B/op	      26 allocs/op
BenchmarkIP2LocationV9_GetOneFloat-12       	 1441219	       795.0 ns/op	     672 B/op	      24 allocs/op
BenchmarkIP2LocationV9_GetTwoString-12      	  546868	      1866 ns/op	    1883 B/op	      53 allocs/op
BenchmarkIP2LocationV9_GetTwoFloat-12       	  693019	      1561 ns/op	    1345 B/op	      49 allocs/op
BenchmarkIP2LocationV9_GetNonexistent-12    	 1399872	       795.5 ns/op	     672 B/op	      24 allocs/op

Here's a summary of the benchmarks in a more readable form:

op ns allocs bytes
ip2xip2loc/v9 ip2xip2loc/v9 ip2xip2loc/v9
Init 68
-97%
2191
32.3x
2
-5
7
3.5x
128
-68%
400
3.1x
LookupOnly 61
-92%
783
12.8x
1
-23
24
24.0x
48
-93%
672
14.0x
GetAll 812
-39%
1324
1.6x
14
-22
36
2.6x
1688
-26%
2268
1.3x
GetOneString 144
-84%
889
6.2x
2
-24
26
13.0x
304
-68%
936
3.1x
GetOneFloat 84
-89%
795
9.4x
1
-23
24
24.0x
48
-93%
672
14.0x
GetTwoString 245
-87%
1866
7.6x
3
-50
53
17.7x
560
-70%
1883
3.4x
GetTwoFloat 101
-94%
1561
15.4x
1
-48
49
49.0x
48
-96%
1345
28.0x
GetNonexistent 80
-90%
796
10.0x
1
-23
24
24.0x
48
-93%
672
14.0x
Code
package main

import (
	"bufio"
	"fmt"
	"math"
	"os"
	"regexp"
	"strconv"
)

var re = regexp.MustCompile(`(?m)^Benchmark([^_]+)_([^-]+)[^\s]+\s+([0-9.]+)\s+([0-9.]+) ns/op\s+([0-9.]+) B/op\s+([0-9.]+) allocs/op`)

func main() {
	type result struct {
		Count       int64
		Nanoseconds float64
		Bytes       float64
		Allocs      float64
	}
	var (
		bylib       = map[string]map[string]result{}
		benchnames  = []string{}
		benchnamesm = map[string]struct{}{}
	)
	sc := bufio.NewScanner(os.Stdin)
	for sc.Scan() {
		row := re.FindStringSubmatch(sc.Text())
		if row == nil {
			continue
		}
		if _, seen := benchnamesm[row[2]]; !seen {
			benchnames = append(benchnames, row[2])
			benchnamesm[row[2]] = struct{}{}
		}
		if _, ok := bylib[row[1]]; !ok {
			bylib[row[1]] = map[string]result{}
		}
		var res result
		res.Count, _ = strconv.ParseInt(row[3], 10, 64)
		res.Nanoseconds, _ = strconv.ParseFloat(row[4], 64)
		res.Bytes, _ = strconv.ParseFloat(row[5], 64)
		res.Allocs, _ = strconv.ParseFloat(row[6], 64)
		bylib[row[1]][row[2]] = res
	}
	if err := sc.Err(); err != nil {
		panic(err)
	}
	const (
		lib1 = "IP2x"
		lib2 = "IP2LocationV9"
	)
	fmt.Println(`<table>`)
	fmt.Println(`<thead>`)
	fmt.Println(`<tr>`)
	fmt.Println(`<th rowspan="2">op</th>`)
	fmt.Println(`<th colspan="2">ns</th>`)
	fmt.Println(`<th colspan="2">allocs</th>`)
	fmt.Println(`<th colspan="2">bytes</th>`)
	fmt.Println(`</tr>`)
	fmt.Println(`<tr>`)
	fmt.Println(`<th>ip2x</th><th>ip2loc/v9</th>`)
	fmt.Println(`<th>ip2x</th><th>ip2loc/v9</th>`)
	fmt.Println(`<th>ip2x</th><th>ip2loc/v9</th>`)
	fmt.Println(`</tr>`)
	fmt.Println(`</thead>`)
	fmt.Println(`<tbody>`)
	for _, benchname := range benchnames {
		r1 := bylib[lib2][benchname]
		r2 := bylib[lib1][benchname]

		fmt.Printf("<tr>\n<td><b>%s</b></td>\n%s\n%s\n%s\n</tr>\n", benchname,
			trow(true, r1.Nanoseconds, r2.Nanoseconds),
			trow(false, r1.Allocs, r2.Allocs),
			trow(true, r1.Bytes, r2.Bytes),
		)
	}
	fmt.Println(`</tbody>`)
	fmt.Println(`</table>`)
}

func trow(pct bool, v1, v2 float64) string {
	c, cc := v2-v1, ""
	if pct {
		c /= math.Abs(v1)
		c *= 100
		cc = "%"
	}
	d := v1 / v2
	return fmt.Sprintf(`<td align="right"><b>%.0f</b><br/><small><i>%+.0f%s</i></small></td><td align="right"><b>%.0f</b><br/><small><i>%.1fx</i></small></td>`, v2, c, cc, v1, d)
}

I've done a few more optimizations and some refactoring. I've also added automatic verification of the output of this library for every row in a few IP2Location databases (you can run it against any of them locally).

With this, I think I'm more or less finished ip2x.

It's been stable for a few months now, so I've released v1.

cool!

I've added support for DB26, but I can't test it since the sample database seems to return corrupt data, even when using the official library.

Result for 71.68.178.128 (random one chosen from the CSV version) in sample DB26 database (SHA1: 43ab840159c7c421f3b6620ea9670f8485dfe53d).

Field 520cede pg9182/ip2x@296a65e
address_type "Pv6 ranges.\x01U\x01-\x04IAB1\x06IAB1-1\x06IAB1-2\x06IAB1-3\x06IAB1-4\x06IAB1-5\x06IAB1-6\x06IAB1-7\x05IAB" "Pv6 ranges.\x01U\x01-\x04IAB1\x06IAB1-1\x06IAB1-2\x06IAB1-3\x06IAB1-4\x06IAB1-5\x06IAB1-6\x06IAB1-7\x05IAB"
area_code "915" "915"
as <nil> "munications\x1aCharter Communications Inc-Chartres Metropole Innovations Numeriques SEM\x1dChartway Federal Credit "
asn <nil> "11414"
category "16\bIAB19-17\bIAB19-18\bIAB19-19\aIAB19-2\bIAB19-2" "16\bIAB19-17\bIAB19-18\bIAB19-19\aIAB19-2\bIAB19-2"
city "usen\x06Wendel\aWendell\vWendelsheim\vWendelstein\x06Wenden\fWendens Ambo\x12Wendisch Borschutz\nWendishain\bWen" "usen\x06Wendel\aWendell\vWendelsheim\vWendelstein\x06Wenden\fWendens Ambo\x12Wendisch Borschutz\nWendishain\bWen"
country_code "ing Islands\x02US\x18United States of America\x02UY\aUruguay\x02UZ\nUzbekistan\x02VA\bHoly See\x02VC Saint Vincent and The Grenadines\x02VE\"Venez" "ing Islands\x02US\x18United States of America\x02UY\aUruguay\x02UZ\nUzbekistan\x02VA\bHoly See\x02VC Saint Vincent and The Grenadines\x02VE\"Venez"
country_name " Islands\x02US\x18United States of America\x02UY\aUruguay\x02UZ\nUzbekistan\x02VA\bHoly See\x02VC Saint Vincent and The Gren" " Islands\x02US\x18United States of America\x02UY\aUruguay\x02UZ\nUzbekistan\x02VA\bHoly See\x02VC Saint Vincent and The Gren"
domain "systems.com\fspectrum.com\x0fspectrum.com.au\fspec" "systems.com\fspectrum.com\x0fspectrum.com.au\fspec"
district <nil> "akayama Shi\vWake County\bWake-gun\tWakefield\fWakkanai Shi\bWako-shi\x0eWakulla County\x06Walcha\f"
elevation 947 "947"
idd_code "6 ranges.\x01-\x011\x041242\x041246\x041264\x041268\x041284\x041340\x041345\x041441\x041473\x041649\x041664\x041670\x041671\x041684\x041721\x041758\x041767\x041784\x041829\x041868\x041869" "6 ranges.\x01-\x011\x041242\x041246\x041264\x041268\x041284\x041340\x041345\x041441\x041473\x041649\x041664\x041670\x041671\x041684\x041721\x041758\x041767\x041784\x041829\x041868\x041869"
isp "Holding Com\x1aCharter Communicatio" "Holding Com\x1aCharter Communicatio"
last_seen <nil> <nil>
latitude 35.78099 35.78099
longitude -78.36972 -78.36972
mcc "ckau\x06Zwolle\x01-\x03202\x03204\x03206\x03208\x03213\x03214\x03216\x03218\x03219\x03220\x03221\x03222\x03226\x03228\x03230\x03231\x03232\x03234\x03238\x03240\x03242\x03244\x03246" "ckau\x06Zwolle\x01-\x03202\x03204\x03206\x03208\x03213\x03214\x03216\x03218\x03219\x03220\x03221\x03222\x03226\x03228\x03230\x03231\x03232\x03234\x03238\x03240\x03242\x03244\x03246"
mnc "Pv6 ranges.\x01-\x0200\x0500/02\x0500/76\a000/120\x03001\x0f004/005/006/012\x0201\x0501/02\b01/02/0" "Pv6 ranges.\x01-\x0200\x0500/02\x0500/76\a000/120\x03001\x0f004/005/006/012\x0201\x0501/02\b01/02/0"
mobile_brand ".\t+7Telecom\x01-$1O1O / One2Free / New World Mobility\b2degrees\x013\x063 (2G)\x043Mob\x034ka\a9mobile\x02A1\x06A1.net\x03AIS\x04APTG\x10ASTELNET, " ".\t+7Telecom\x01-$1O1O / One2Free / New World Mobility\b2degrees\x013\x063 (2G)\x043Mob\x034ka\a9mobile\x02A1\x06A1.net\x03AIS\x04APTG\x10ASTELNET, "
net_speed "-" "-"
provider <nil> <nil>
proxy_type <nil> <nil>
region "\nNorth Bank\x0eNorth Carolina\x16North Central Province\fNorth Dakota\fNorth Darfur\nNorth East\x0fNorth Eleuthera\x0eNorth Kordof" "\nNorth Bank\x0eNorth Carolina\x16North Central Province\fNorth Dakota\fNorth Darfur\nNorth East\x0fNorth Eleuthera\x0eNorth Kordof"
threat <nil> <nil>
time_zone "2:30\x06-03:00\x06-04:00\x06-05:00\x06-06:00\x06-07:00\x06-08:00\x06-" "2:30\x06-03:00\x06-04:00\x06-05:00\x06-06:00\x06-07:00\x06-08:00\x06-"
usage_type "DCH" "DCH"
weather_station_code "42\bUSNC0743\bUSNC0744\bUSNC0745\bUSNC0746\bUSNC0747\bUSNC074" "42\bUSNC0743\bUSNC0744\bUSNC0745\bUSNC0746\bUSNC0747\bUSNC074"
weather_station_name "chee\x06Wendel\aWendell\x06Wenden\bWendover\x06Wenham\x06Wenona\aWenonah\tWentworth\nWentzville\aWenzhou\x05Weott\fWernersville\vWernigerod" "chee\x06Wendel\aWendell\x06Wenden\bWendover\x06Wenham\x06Wenona\aWenonah\tWentworth\nWentzville\aWenzhou\x05Weott\fWernersville\vWernigerod"
zip_code "\x042759\x0527590\x0527591\x0527592\x0527593\x0527594\x0527595\x0527596\x0527597\x05275" "\x042759\x0527590\x0527591\x0527592\x0527593\x0527594\x0527595\x0527596\x0527597\x05275"

New benchmarks as of 520cede (including #21):

  • Official library now has much fewer allocations, but still many times more than ip2x.
  • Official library is now about the same speed for getting all records, but still much slower for a subset.
  • Official library is about twice as fast as before, but still many times slower than ip2x.
op ns allocs bytes
ip2xip2loc/v9 ip2xip2loc/v9 ip2xip2loc/v9
Init 61
-96%
1662
27.2x
2
-15
17
8.5x
128
-82%
696
5.4x
LookupOnly 76
-76%
312
4.1x
1
-5
6
6.0x
48
-79%
229
4.8x
GetAll 640
-1%
646
1.0x
14
+2
12
0.9x
1688
-4%
1765
1.0x
GetOneString 137
-63%
374
2.7x
2
-5
7
3.5x
304
-37%
485
1.6x
GetOneFloat 91
-72%
322
3.6x
1
-5
6
6.0x
48
-79%
229
4.8x
GetTwoString 213
-72%
767
3.6x
3
-11
14
4.7x
560
-42%
970
1.7x
GetTwoFloat 106
-84%
666
6.3x
1
-11
12
12.0x
48
-90%
458
9.5x
GetNonexistent 90
-72%
318
3.5x
1
-5
6
6.0x
48
-79%
229
4.8x

We've tested the IP2Location Go with the sample DB26 BIN and we got the below. Please note that "provider", "proxy_type", "threat" is only available in the IP2Proxy BIN and not the IP2Location BIN.

image

Actually, turns out that the IPv6 DB26 sample BIN has issues. The above was tested using the IPv4 sample BIN which is ok. Will update again once the IPv6 DB26 sample BIN has been fixed.

IPv6 DB26 sample BIN has been fixed.

Thanks @ip2location; it works fine now:

image