[Question/bug?] Transformed HTML-to-text still includes "&"
Closed this issue · 4 comments
Hi @stevebauman thanks for developing this, I'm trying it out on a new project. I'm finding that the transformer does mostly what I'd like it to do, but even though it's decoding some HTML entities like
and “
it's leaving behind &
. Is there a reason this one is excluded?
example string: <p>Here's some text that is a bit “rough & ready”</p>
output: Here's some text that is a bit “rough & ready”
I think this is probably related to using HTMLPurifier, but since it seems the goal is to get to plain text, I'm wondering if maybe an extra step is needed in the transformation pipeline.
[To clarify: I'm using this in the context of preparing text for a Meilisearch index, within a Laravel app.]
Hey @sgilberg, thanks for trying out hypertext!
Let me give this a shot -- I think we may just need to run html_entity_decode()
over the result before returning it.
I'm going to classify this as a bug 👍
Hey @sgilberg,
I've just resolved this in the latest v1.1.1 release.
I've added your example as a test case to ensure it's been covered:
hypertext/tests/Unit/TransformerTest.php
Lines 125 to 130 in 602396e
Run composer update
and you're all set! Thanks again for the report 🙏
Thanks @stevebauman confirmed this now works in my application, and I can now drop my own html_entity_decode()
workaround 👍
Excellent, great to hear @sgilberg. Appreciate you reporting back and confirming.