This operates on the CALO dataset, and is pretty hackish.
npm install
# or
yarn
Get the CALO dataset. Untar it here. The directory structure should look something like:
|-- index.js
|-- maildir
|-- allen-p
|-- inbox
|-- ...
|-- arnold-j
|-- ...
|-- package.json
|-- ...
Then, simply run node index.js
. The application will read in emails,
convert them to JSON, write them in place, and delete the original
plaintext emails. The process isn't great, but gave me a quick and easy
way to cancel and resume the process without having to start from the
beginning. As a result, if you want to modify the conversion, you'll
have to remove the maildir
and unpack the tar with the plaintext
emails again.
A compressed mongodb back up file. http://mongodb-enron-email.s3-website-us-east-1.amazonaws.com/