
Python script for searching through email documents in the Enron Corpus

Primary LanguagePython



sfor is a Python script made specifically for searching through email documents in the Enron Corpus. It employs a record-level index and excludes stopwords from the NLTK Stopwords Corpus1.


It can search for multiple terms and match ANY or ALL of the search terms.

It will index/reindex/load index depending on whether index file (i_index.txt) exists within the search directory and whether files in the search directory have been updated since last indexing.


Keep sfor and nltk-stopwords.txt in the same directory.

Syntax: ./sfor <term1> <term2> ... <termn> -d <path/to/search> [-a]

By default, sfor will look for matches to ANY of the search terms. To return only results that match ALL terms, use -a.


A demo gif is available but not linked to this readme due to flashes while rendering.

1NLTK Stopwords Corpus used is obtained from NLTK.org.