Style attributes are getting stripped off
Buvi1234 opened this issue · 13 comments
If i sanitise the style tag it is stripped
for eg :
"
<span style=\\\"color: inherit; background: inherit;\\\"><span dir=\\\"ltr\\\" style=\\\"color: inherit; background: inherit;\\\">
this the data I give and the output is like this
"
<span dir='\"ltr\"' style="">
i have allowed the style, p and span tag in css_sanitizser i have also given color and background also but it is getting stripped out
any reason?
Can you put that into a test or a python script so it's easier to see what you're doing?
(Pdb) bleach.clean('
<span style=\"color: inherit; background: inherit;\
', tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES, css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES), strip=True)output is
LIke Why it is happening in such a way, style contents are not retained here why any reason.
But for class attributes the output is coming properly
#91 --> same issue
Here's what I think you're trying to do based on the last couple of comments:
from bleach import (
clean,
ALLOWED_ATTRIBUTES,
ALLOWED_TAGS,
)
from bleach.css_sanitizer import (
ALLOWED_CSS_PROPERTIES,
CSSSanitizer,
)
print(clean(
'<p><span style=\\"color: inherit; background: inherit;\\</span></p>',
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES),
strip=True,
))
The text that's being cleaned is missing a lot of HTML things. It's got a \"
and it's missing an end "
and the closing of the first <span>
tag. When the Bleach parser goes through that, it's seeing HTML-like things and trying to complete them and then that probably results in it dropping stuff.
On top of that, the set of ALLOWED_TAGS
doesn't include p
or span
. You would need to include those tags for them to not get stripped. You'd need to do something like this:
from bleach import clean, ALLOWED_TAGS
my_tags = set(["p", "span"] + list(ALLOWED_TAGS))
clean("<p><span>something</span></p>", tags=my_tags)
Set of ALLOWED_TAGS
is documented here: https://bleach.readthedocs.io/en/latest/clean.html#allowed-tags-tags
For CSS properties, you're using the default set which does include color
, but doesn't include background
. You'd need to pass in a set of properties that includes background
.
Set of ALLOWED_CSS_PROPERTIES
isn't documented, but you can see it in the code here:
bleach/bleach/css_sanitizer.py
Line 4 in c04958d
Does that help?
Yeah up to some extend the allowed tag i have span and p tag and allowed_attributes i have mentioned
*: {['style', "class"]}
but the style tag value is been stripped any reason?
Please provide a complete Python script showing the problem you're having. Something I can put in a file and run python example_script.py
to see the problem.
I have attached what i am trying to, please have a look
`import bleach
from bleach.css_sanitizer import CSSSanitizer
ALLOWED_TAGS = ['abbr', 'acronym', 'b', 'br', 'div', 'dl', 'dt',
'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr',
'i', 'li', 'ol', 'p', 'q', 's', 'small', 'strike',
'strong', 'span', 'sub', 'sup', 'table', 'tbody',
'td', 'th', 'thead', 'tr', 'tt', 'u', 'ul', 'address', 'img', 'pre', 'tfoot', 'bdo',
'big', 'blockquote', 'center', 'cite', 'code', 'dd', 'del', 'dfn', 'embed', 'font', 'ins', 'kbd',
'param', 'samp', 'var', 'wbr']
ALLOWED_ATTRIBUTES = {
'*': ['class', 'title', 'style', 'align', 'cite', 'size', 'type', 'dir'],
'img': ['src', 'alt'],
'table': ['border', 'width', 'height'],
}
ALLOWED_CSS_PROPERTIES = ["color", "font-weight", "width", "border-bottom", "padding",
"background-color", "font-size", "display", "box-sizing",
"line-height", "white-space", "height", "line-height", "font-size",
"font", "border-width", "text-align", "border-color", "font-family",
"border-left-color", "border-right-color", "border-top-color", "font-color",
'text-decoration', "background"]
val = '
<span style=\\\"color: inherit; background: inherit;\\\"><span dir=\\\"ltr\\\" style=\\\"color: inherit; background: inherit;\\\">'
val = bleach.clean(val,
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES),
strip=True)
print(val)`
You've got a lot of extra \
which is causing the attribute values to be invalid HTML and my guess is they're getting dropped. If you change val to be this:
<span style="color: inherit; background: inherit;"><span dir="ltr" style="color: inherit; background: inherit;">
Then you get this as output:
<span style="color: inherit; background: inherit;"><span dir="ltr" style="color: inherit; background: inherit;"></span></span>
Does that look like what you're looking for?
yes but without removing \ can we achieve it. but dir attribute it was working fine. can do it same for style tag also. or is there any way to achieve.
No, because all the \
make it nonsensical input data. Bleach clean will make sure the output is valid HTML, but it's not going to fix the input data in that way to make it less nonsensical. You'll need to write some code to fix the input text before running it through Bleach clean.
Thank you for info,
just need a clarity like in the above comments i have attached a debugger screenshot where class we as able to retain some color value but style was not able to, any reason behind that?
I can't speculate on the pdb screenshot because it's missing a lot of context. Can you put it into a Python script so I can investigate?
#529
`import bleach
from bleach.css_sanitizer import CSSSanitizer
ALLOWED_TAGS = ['abbr', 'acronym', 'b', 'br', 'div', 'dl', 'dt',
'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr',
'i', 'li', 'ol', 'p', 'q', 's', 'small', 'strike',
'strong', 'span', 'sub', 'sup', 'table', 'tbody',
'td', 'th', 'thead', 'tr', 'tt', 'u', 'ul', 'address', 'img', 'pre', 'tfoot', 'bdo',
'big', 'blockquote', 'center', 'cite', 'code', 'dd', 'del', 'dfn', 'embed', 'font', 'ins', 'kbd',
'param', 'samp', 'var', 'wbr']
ALLOWED_ATTRIBUTES = {
'*': ['class', 'title', 'style', 'align', 'cite', 'size', 'type', 'dir'],
'img': ['src', 'alt'],
'table': ['border', 'width', 'height'],
}
ALLOWED_CSS_PROPERTIES = ["color", "font-weight", "width", "border-bottom", "padding",
"background-color", "font-size", "display", "box-sizing",
"line-height", "white-space", "height", "line-height", "font-size",
"font", "border-width", "text-align", "border-color", "font-family",
"border-left-color", "border-right-color", "border-top-color", "font-color",
'text-decoration', "background"]
val = '
<span style=\"color: inherit; background: inherit;\"><span dir=\"ltr\" style=\"color: inherit; background: inherit;\">'
val = bleach.clean(val,
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES),
strip=True)
print(val)
output: '
<span dir='\"ltr\"' style="">
'val = '
<span style='"color:': inherit; background: inherit;\"><span dir=\"ltr\" style='"color:': inherit; background: inherit;\">'
val = bleach.clean(val,
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
css_sanitizer=CSSSanitizer(ALLOWED_CSS_PROPERTIES),
strip=True)
print(val)
output is for reference
style tag does not support dashes, but other attributes supports it.
I get the following output from the first blurb:
<span style="color: inherit; background: inherit;"><span dir="ltr" style="color: inherit; background: inherit;"></span></span>
That output looks fine to me.
The second is malformed Python, so it doesn't execute.
At this point, I can't continue helping you with this. I don't see anything in here that suggests there's a bug in Bleach--it sure seems like it's doing what it should be doing. I think I've given you enough insight here to help you continue exploring why Bleach isn't doing what you want it to do. Hope that helps!