/EmailReplyParser

Port of the cool GitHub's EmailReplyParser library in PHP.

Primary LanguagePHPMIT LicenseMIT

EmailReplyParser

Build Status Total Downloads Latest Stable Version

EmailReplyParser is a port of the GitHub's EmailReplyParser library written in Ruby.

This is a small PHP library to parse plain text email content.

Installation

If you don't use a ClassLoader in your application, just require the provided autoloader:

<?php

require_once 'src/autoload.php';

You're done.

Usage

Instantiate an Email object and you're done:

<?php

$email = new \EmailReplyParser\Email();

$reply = $email->read($emailContent);
// same as:
$reply = $email->getFragments();

Alternatively, you can use the static way:

$reply = \EmailReplyParser\EmailReplyParser::read($emailContent);

$reply is an array of Fragment objects, i.e., $fragment = $reply[0];. To get the content of each fragment, call the getContent() method.

A Fragment can be a signature, a quoted text, or an hidden text. Here is the API:

<?php
$fragment = $reply[0];

// Get the content
$fragment->getContent();
// Whether the fragment is a signature or not
$fragment->isSignature();

// Whether the fragment is quoted or not
$fragment->isQuoted();

// Whether the fragment is hidden or not
$fragment->isHidden();

// Whether the fragment is empty or not
$fragment->isEmpty();

Known Issues

Quoted Headers

Quoted headers aren't picked up if there's an extra line break:

On <date>, <author> wrote:

> blah

Also, they're not picked up if the email client breaks it up into multiple lines. GMail breaks up any lines over 80 characters for you.

On <date>, <author>
wrote:
> blah

The above On ....wrote: can be cleaned up with the following regex:

$fragment_without_date_author = preg_replace(
    '/\nOn(.*?)wrote:(.*?)$/si',
    '',
    $fragment->getContent()
);

Note though that we're search for "on" and "wrote". Therefore, it won't work with other languages.

Possible solution: Remove "reply@reply.github.com" lines...

Weird Signatures

Lines starting with - or _ sometimes mark the beginning of signatures:

Hello

--
Rick

Not everyone follows this convention:

Hello

Mr Rick Olson
Galactic President Superstar Mc Awesomeville
GitHub

**********************DISCLAIMER***********************************
* Note: blah blah blah                                            *
**********************DISCLAIMER***********************************

Strange Quoting

Apparently, prefixing lines with > isn't universal either:

Hello

--
Rick

________________________________________
From: Bob [reply@reply.github.com]
Sent: Monday, March 14, 2011 6:16 PM
To: Rick

Unit Tests

To run the test suite, run Composer first to setup the autoloader:

php composer.phar install

Then run the following command:

phpunit

Credits

License

EmailReplyParser is released under the MIT License. See the bundled LICENSE file for details.