A Python library to parse strings and extract information from structured/unstructured data
- parsing and matching patterns in a string(log, message etc.)
- relieving from complex regular expressions.
- extracting information from structured/unstructured data
first, install regex, simply:
$ sudo pip regex
or from source:
$ sudo python setup.py install
then download, uncompress and install pygrok from source:
$ tar zxvf pygrok-xx.tar.gz
$ cd pygrok_dir
$ sudo python setup.py install
>>> import pygrok
>>> text = 'gary is male, 25 years old and weighs 68.5 kilograms'
>>> pattern = '%{WORD:name} is %{WORD:gender}, %{NUMBER:age} years old and weighs %{NUMBER:weight} kilograms'
>>> print pygrok.grok_match(text, pattern)
{'gender': 'male', 'age': '25', 'name': 'gary', 'weight': '68.5'}
Beause python re module does not support regular expression syntax atomic grouping(?>),so pygrok requires regex to be installed.
pygrok is inspired by Grok developed by Jordan Sissel. This is not a wrapper of Jordan Sissel's Grok and totally implemented by me.
Grok is a simple software that allows you to easily parse strings, logs and other files. With grok, you can turn unstructured log and event data into structured data.pygrok does the same thing.
I recommend you to have a look at logstash filter grok, it explains how Grok-like thing work.
pattern files come from logstash filter grok's pattern files
I use Trello to manage TODO list of this project.
mail:garygaowork@gmail.com
twitter:@garyelephant