/limelight

A php Japanese language text analyzer and parser.

Primary LanguagePHPMIT LicenseMIT

Limelight

Latest Stable Version License

A php Japanese language analyzer and parser.
  • Split Japanese text into individual, full words
  • Find parts of speech for words
  • Find dictionary entries (lemmas) for conjugated words
  • Get readings and pronunciations for words
  • Build furigana for words
  • Convert Japanese to romaji (English lettering)

Quick Guide

Version Notes

  • April 25, 2016: The Limelight API changed in Version 1.6.0. The new API uses collection methods to give developers better control of Limelight parse results. Please see the wiki for the updated documentation.
  • April 11, 2016: php-mecab, the MeCab bindings Limelight uses, were updated to version 0.6.0 in Dec. 2015 for php 7 support. The pre-0.6.0 bindings no longer work with the master branch of Limelight. If you are using an older version of php-mecab, please update your bindings or use the php-mecab_pre_0.6.0 version.

Install Limelight

Using Docker

From the project root, build the image:

docker build -f docker/Dockerfile -t limelight .

Once it is built, run the container:

docker run --name limelight -v /host/path/to/limelight:/usr/limelight -d --rm limelight

Access the project in the container:

docker exec -it limelight bash

Install composer dependencies from within the container:

composer install

Without Docker

Requirements
  • php > 5.6
Dependencies

Before installing Limelight, you must install both mecab and the php extension php-mecab on your system.

Linux Ubuntu Users

Use the install script included in this repository. The script only works for and php7. Download the script:

curl -O https://raw.githubusercontent.com/nihongodera/limelight/master/install_mecab_php-mecab.sh

Make the file executable:

chmod +x install_mecab_php-mecab.sh

Execute the script:

./install_mecab_php-mecab.sh

You may need to restart your server to complete the process.

For information about what the script does, see here.

Other Systems

Please see this page to learn more about installing on your system.

Install Limelight

Install Limelight through composer.

composer require nihongodera/limelight

Parse Text

Make a new instance of Limelight\Limelight. Limelight takes no arguments.

$limelight = new Limelight();

Use the parse() method on the Limelight object to parse Japanese text.

$results = $limelight->parse('庭でライムを育てています。');

The returned object is an instance of Limelight\Classes\LimelightResults.

Get Results

Get results for the entire text using methods available on LimelightResults.

$results = $limelight->parse('庭でライムを育てています。');

echo 'Words: ' . $results->string('word') . "\n";
echo 'Readings: ' . $results->string('reading') . "\n";
echo 'Pronunciations: ' . $results->string('pronunciation') . "\n";
echo 'Lemmas: ' . $results->string('lemma') . "\n";
echo 'Parts of speech: ' . $results->string('partOfSpeech') . "\n";
echo 'Hiragana: ' . $results->toHiragana()->string('word') . "\n";
echo 'Katakana: ' . $results->toKatakana()->string('word') . "\n";
echo 'Romaji: ' . $results->string('romaji', ' ') . "\n";
echo 'Furigana: ' . $results->string('furigana') . "\n";

Output: Words: 庭でライムを育てています。 Readings: ニワデライムヲソダテテイマス。 Pronunciations: ニワデライムヲソダテテイマス。 Lemmas: 庭でライムを育てる。 Parts of speech: noun postposition noun postposition verb symbol Hiragana: にわでらいむをそだてています。 Katakana: ニワデライムヲソダテテイマス。 Romaji: niwa de raimu o sodateteimasu. Furigana: (にわ)でライムを(そだ)てています。

Alter the collection of words however you like using the library of collection methods.

Get individual words off the LimelightResults object by using one of several applicable collection methods. Use methods available on the returned LimelightWord object.

$results = $limelight->parse('庭でライムを育てています。');

$word1 = $results->pull(2);

$word2 = $results->where('word', '');

echo $word1->string('romaji') . "\n";

echo $word2->string('furigana') . "\n";

Output: raimu にわ

Methods on the LimelightResults object and the LimelightWord object follow the same conventions, but LimelightResults methods are plural (words()) while LimelightWord methods are singular (word()).

Alternatively, loop through all the words on the LimelightResults object.

$results = $limelight->parse('庭でライムを育てています。');

foreach ($results as $word) {
    echo $word->word() . ' is a ' . $word->partOfSpeech() . ' read like ' . $word->reading() . "\n";
}

Output: 庭 is a noun read like ニワ で is a postposition read like デ ライム is a noun read like ライム を is a postposition read like ヲ 育てています is a verb read like ソダテテイマス 。 is a symbol read like 。

Full Documentation

Full documentation for Limelight can be found on the Limelight Wiki page.

Sources, Contributions, and Contributing

The Japanese parsing logic used in Limelight was adapted from Kimtaro's excellent Ruby program Ve. A big thank you to him and all the others who contributed on that project.

Limelight relies heavily on both MeCab and php-mecab.

Collection methods and methods in the Arr class were derived from Laravel's collection methods.

Contributors more than welcome.

Top