feature: strip all URLs
jvanasco opened this issue · 2 comments
It does not seem possible to strip all URLs with Bleach.
For example, the closest we can get to from the docs is...
import bleach
def remove_it(attrs, new=False):
return None
payloads = (
'a <a href="http://example.com/outer">https://example.com/inner</a> b',
"a https://example.com/bare b",
)
for payload in payloads:
print("=====")
result = bleach.linkify(payload, callbacks=[remove_it])
print(result)
result = bleach.clean(payload, protocols=[])
print(result)
However, The result is:
=====
a https://example.com/inner b
a <a>https://example.com/inner</a> b
=====
a https://example.com/bare b
a https://example.com/bare b
While the desired result is simply:
=====
a b
a b
=====
a b
a b
In many situations dealing with User Generated Content, preventing any URLs whatsoever is desirable - even rendered as plaintext. Currently, this must be handled outside of bleach in a separate processing step. Being able to filter this out within bleach is desirable, as the URLs have already been parsed.
I think you need to write a new filter. I bet you could base it on the current LinkifyFilter but change this part here:
Lines 316 to 332 in 4f951d3
Does that help?
I'm assuming that helped. Closing this out.