masroore/html2text is a PHP package that converts a page of HTML into clean, easy-to-read plain ASCII text.
Requires PHP 8.0+
You can install the package via composer:
composer require masroore/html2text
Extract text from HTML:
use Kaiju\Html2Text\Html2Text;
$converter = new Html2Text();
echo $converter->convert($html);
Callback functions
You are able to change process of formatting by providing callbacks in pre-processing, tag-replacing and post-processing:
# assign a pre-processing callback function. (transform href links)
$converter->setPreProcessingCallback(fn (string $s) => preg_replace('%<\s*a[^>]*href=[\'"](.*?)[\'"][^>]*>([\s\S]*?)<\/\s*a\s*>%i', '$2 ($1)', $s));
# assign a tag-replacement callback function. (replace <li> tags)
$converter->setTagReplacementCallback(fn (string $s) => preg_replace('/<\s*li[^>]*>/i', "\n- ", $s));
# post-processing hook
$converter->setPostProcessingCallback(...);
# process HTML
echo $converter->convert($html);
composer test
Please see CHANGELOG for more information on what has changed recently.
Thank you for considering to contribute to Html2Text. All the contribution guidelines are mentioned here.
Please review our security policy on how to report security vulnerabilities.
Html2Text is an open-sourced software licensed under the MIT license.