Stopwords in multiple languages that you can easily use with your PHP applications.
Currently provides stopwords for the following languages:
- Arabic
- Azerbaijani
- Bengali
- Danish
- Dutch
- English
- Finnish
- French
- German
- Greek
- Hungarian
- Indonesian
- Italian
- Kazakh
- Nepali
- Norwegian
- Portuguese
- Romanian
- Russian
- Slovene
- Spanish
- Swedish
- Tajik
- Turkish
Requires PHP 8.0+
You can install the package via composer:
composer require masroore/stopwords
$stopwords = new Kaiju\Stopwords\Stopwords();
// get the list of available languages
print_r($stopwords->getLanguages());
// load stopwords for a language
$stopwords->load('english');
// load stopwords for multiple languages
$stopwords->load(['english', 'french']);
// load stopwords for all available languages
$stopwords->load('*');
// check if the given word is a stop-word
$stopwords->isStopword('the'); // TRUE
$stopwords->isStopword('America'); // FALSE
// return a tokenized copy of the text, with stop-words and punctuation marks removed
$text = "Good muffins cost $3.88\nin New York. Please buy me two of them.\n\nThanks!\n";
print_r($stopwords->strip($text));
// ["Good","muffins","cost","$3.88","New","York","Please","buy","two","Thanks"]
echo $stopwords->clean($text);
// "Good muffins cost $3.88 New York Please buy two Thanks"
composer test
Please see CHANGELOG for more information on what has changed recently.
Thank you for considering to contribute to Collision. All the contribution guidelines are mentioned here.
Please review our security policy on how to report security vulnerabilities.
Collision is an open-sourced software licensed under the MIT license.