/trigram

A trigram indexing implement in Go

Primary LanguageGo

Trigram Indexing

GitHub license GoDoc Build Status

This package provide a simple way to "Trigram Indexing" in input document. It is refer from an article - Google Code Search.

Here is the introduction what is "trigram indexing" and how Google Code Search use it for search but it is in Chinese :) .

How it works

This package using trigram indexing to get all trigram in input string (what we call document).

Here is some trigram rule as follow:

  • It will not transfer Upper case to Lower case. (follow code search rule)
  • Includes "space"

Install

go get github.com/kkdai/trigram

Usage

package main

import (
	"fmt"
	. "github.com/kkdai/trigram"
	)
func main() {	
	ti := NewTrigramIndex()
	ti.Add("Code is my life")			//doc 1
	ti.Add("Search")						//doc 2
	ti.Add("I write a lot of Codes") //doc 3
	
	//Print all trigram map 
	fmt.Println("It has ", len(ti.TrigramMap))
	for k, v := range ti.TrigramMap {
		fmt.Println("trigram=", k, " obj=", v)
	}

	//Search which doc include this code
	ret := ti.Query("Code")
	fmt.Println("Query ret=", ret)
	// [1, 3]
}

Benchmark

Still working to improve the query time.

BenchmarkAdd-4   	  300000	      6743 ns/op
BenchmarkDelete-4	  500000	      4021 ns/op
BenchmarkQuery-4 	   10000	      7894005 ns/op
BenchmarkIntersect-4  300000	      4496 ns/op

BTW: Here is benchmark for https://github.com/dgryski/go-trigram for my improvement record:

BenchmarkAdd-4   	 1000000	      1063 ns/op
BenchmarkDelete-4	  100000	    140392 ns/op
BenchmarkQuery-4 	   10000	    474320 ns/op

Inspired

Project52

It is one of my project 52.

License

This package is licensed under MIT license. See LICENSE for details.

Bitdeli Badge