This parser parses MediaWiki categories.
This package is composer-enabled. Just require it in your composer.json
.
"require": {
"kolyunya/wiki-parser": "*"
}
The following code parses English nouns to a file.
// Create a parser instance.
$parser = new Parser();
// Set host to wiktionary
$host = new Wiktionary();
$parser->setHost($host);
// Set language to English.
$language = new EnglishLanguage();
$parser->setLanguage($language);
// Set category to nouns.
$category = new NounsCategory();
$parser->setCategory($category);
// Add a filter which will filter out all non-alphabetical words.
$filter = new AlphabetFilter();
$parser->addFilter($filter);
// Create a processor which will write all words to a file.
$processor = new FileSaver();
$parser->addProcessor($processor);
// Perform parsing.
$parser->parse();
WordFilter
- passes words matching the^\w+$
regular expression pattern.AlphabetFilter
- passes words containing only alphabetical characters in a corresponding language.MinimumLengthFilter
- passes words longer than a specified length.MaximumLengthFilter
- passes words shorter than a specified length.
StdoutPrinter
- prints all words to thestdout
.FileSaver
- saves all words to a specified file.DatabaseSaver
- saves all words to a database.LowercaseShifter
- converts all words to a lowercase.UppercaseShifter
- converts all words to an uppercase.
To add an arbitrary language you should implement the LanguageInterface
. It contains only to methods. The getCode
method must return the standard language code (e.g. en
for English). The getAlphabet
method must return an array of letters in language alphabet.
The recomended way of implementing custom categories is to extend the BaseCategory
class. You should use the setTitle
method in your category class constructor to add titles for specific languages.
class NounsCategory extends BaseCategory implements CategoryInterface
{
public function __construct()
{
$this->setTitle(new EnglishLanguage(), 'Category:English_nouns');
$this->setTitle(new FrenchLanguage(), 'Catégorie:Noms_communs_en_français');
$this->setTitle(new GermanLanguage(), 'Kategorie:Substantiv_(Deutsch)');
$this->setTitle(new RussianLanguage(), 'Категория:Русские_существительные');
}
}
If you need to implement a custom processor you have two options.
First, you can implement the ProcessorInterface
which is quite straightforward.
class StdoutPrinter implements ProcessorInterface
{
public function process(LanguageInterface $language, &$item)
{
$data = "$item\n";
echo $data;
}
}
Second, you can use a CustomProcessor
which should be provided with a callback function which will do something.
$stdoutPrinter = new CustomProcessor(
function (LanguageInterface $language, &$item) {
$data = "$item\n";
echo $data;
}
);