/python-grep

implementation of grep written in python

Primary LanguagePython

This is a python implementation of the Unix grep command.

Written during Hacker School, Batch[4], Fall 2012.

Usage

python mygrep.py [--help] [--color] [-h] [-i] [-l] [-n] [-R]
             pattern [FILES [FILES ...]]

Find matches of a pattern in lines of file(s).

positional arguments:
pattern               the pattern to find
FILES                 the files(s) to search

optional arguments:
--help                     show this help message and exit
--color, --colour          highlight matches
-h, --no-filename          print without filename headers
-i, --ignore-case          case-insensitive search
-l, --files-with-matches   print only filenames with matches
-n, --line-number          print line numbers, indexed from 1
-R, -r, --recursive        recursively search directories

If called with no file arguments, reads from standard input and follows grep's behavior when reading from standard input. Can also be used with pipes.

Boyer-Moore Algorithm

As an exercise in optimazation, I looked into string searching algorithms to make mygrep faster. I decided to go with Boyer-Moore because that's what GNU's grep is based on. I worked on a separate mini-version of mygrep for testing:

python search_test.py [--help] [--color] [-R] [-b] pattern [FILES [FILES ...]]

Find occurrences of a pattern in lines of file(s).

positional arguments:
pattern              the pattern to find
FILES                the files(s) to search

optional arguments:
--help               show this help message and exit
--color, --colour    highlight pattern in output
-R, -r, --recursive  recursively search directories
-b, --boyer-moore    search with boyer-moore algorithm

Test

Using naive search python search_test.py King bible.txt:

real    0m2.173s
user    0m2.144s
sys 0m0.024s

Using Boyer_Moore search python search_test.py King bible.txt -b, you can see it's a little faster:

real    0m1.732s
user    0m1.705s
sys 0m0.023s

But not nearly as fast as regular grep grep King bible.txt:

real    0m0.200s
user    0m0.196s
sys 0m0.004s

Resources

##Todo/Extension Ideas

  • use boyer-moore in mygrep.py
  • figure out if output can handle colors, when piping or writing results to a file.