/php-html2text

A PHP package to convert HTML into a plain text format

Primary LanguagePHPMIT LicenseMIT

A PHP package to convert HTML into plain text -- no HTML tags allowed in the output.

Latest Version on Packagist GitHub Tests Action Status GitHub Code Style Action Status Total Downloads

Overview

masroore/html2text is a PHP package that converts a page of HTML into clean, easy-to-read plain ASCII text.

Installation

Requires PHP 8.0+

You can install the package via composer:

composer require masroore/html2text

Usage

Extract text from HTML:

use Kaiju\Html2Text\Html2Text;

$converter = new Html2Text();
echo $converter->convert($html);

Callback functions

You are able to change process of formatting by providing callbacks in pre-processing, tag-replacing and post-processing:

# assign a pre-processing callback function. (transform href links)
$converter->setPreProcessingCallback(fn (string $s) => preg_replace('%<\s*a[^>]*href=[\'"](.*?)[\'"][^>]*>([\s\S]*?)<\/\s*a\s*>%i', '$2 ($1)', $s));

# assign a tag-replacement callback function. (replace <li> tags)
$converter->setTagReplacementCallback(fn (string $s) => preg_replace('/<\s*li[^>]*>/i', "\n- ", $s));

# post-processing hook
$converter->setPostProcessingCallback(...);

# process HTML
echo $converter->convert($html);

Testing

composer test

Changelog

Please see CHANGELOG for more information on what has changed recently.

Contributing

Thank you for considering to contribute to Html2Text. All the contribution guidelines are mentioned here.

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Credits

License

Html2Text is an open-sourced software licensed under the MIT license.