rrrene/html_sanitize_ex

Write an excerpt scrubber?

aphillipo opened this issue · 2 comments

Hi there @rrrene!

I'm wondering how difficult it would be to write an excerpt scrubber that only took the first X text nodes inside top level p's and div's maybe?

It's difficult to measure the these things based on the length of the underlying text but I think your library will help!

I'd like to try to make the excerpts visually appealing at the edge cases too by saying if a text node was short we can play around with if we show the next paragraph or not (i.e. I have 255 character limit but the first paragraph is 225 chars and the second is chopped off with only 30 chars it's pretty pointless and you can drop short last paragraphs or add some extra chars).

It's so damn obvious for a human to know if they should just include the whole of the second paragraph in the excerpt or not just by looking at it, I'd like to get about 80% of the way there :-)

Let me know if you think this is something you want in your sanitizer - it might be considered out of scope!

This is a pretty cool idea and I am confident that you can leverage HtmlSanitizeEx's tools to achieve this.

The reason why I would not want it to be part of the library is simply that a lot of people have different definitions of "excerpt" and that fact alone could lead to a maintainer's nightmare. 😭

You can look at the Meta module to see how to deal with tags with children and how to scrub text. 👍

Okay! Feel free to close this for now then :-D Thanks for this library, I'll be using it in lots of places in my app.