heiglandreas/Org_Heigl_Hyphenator

English hyphenation results

Opened this issue · 2 comments

Hi!

I tried this code:

use \Org\Heigl\Hyphenator as h;
$hyphenator = h\Hyphenator::factory();
echo $hyphenator->hyphenate('hyphenation');
// hy-phe-na-ti-on
echo $hyphenator->hyphenate('chocolate');
// choco-late

Expected results are:
hy-phen-a-tion
choc-o-late

My config is:

noHyphenateString = null
hyphen = "-"
leftMin = 1
rightMin = 1
wordMin = 3
quality = 9
customHyphen = "=="
defaultLocale = "en_US"
tokenizers = "Whitespace,Punctuation"
filters = "Simple,CustomMarkup"

Any idea why I am seeing different results?

Thanks!
Jose

Hey @joseflorido - Sorry for the late response. It looks like the base of the Hyphenation patterns that this library uses – the American English hyphenation patterns for OpenOffice.org – do not contain patterns that allow the hyphenation that you expect.

As there are other (partly pretty expensive) hyphenation algorithms available it might happen, that other websites propose other hyphenations.

I'm currently though checking whether there is a newer dictionary file available that perhaps matches your expectations as well.

Until then you can add your own hyphenation patterns as described in #49 (comment)