Indented `<script>` tags are parsed as Markdown instead of being skipped.
nbanyan opened this issue · 12 comments
My scenario:
I'm using PyMdown Extensions' snippets to insert a fenced code block containing a bash command. The same snippet has a <script>
block to pull data from another file (using the same snippet extension) to ensure the command is always accurate.
This works, but breaks if used inside any Markdown block element, such as a list.
Sample test function:
def testBlockInput(self):
""" Test whether script block is ignored. """
script = '''* Testing indented script block
<script>
if (1 < 2 && 3 > 1) {
console.log("Success `conditional`!");
}
</script>'''
parsed_script = '''<ul>
<li>Testing indented script block
<script>
if (1 < 2 && 3 > 1) {
console.log("Success `conditional`!");
}
</script></li>
</ul>'''
self.assertEqual(self.md.convert(script), parsed_script)
So what specifically isn't working? This isn't very clear from your post.
One thing to suggest with Python Markdown is when you have a separate block that you put a newline between the blocks.
* Testing indented script block
<script>
if (1 < 2 && 3 > 1) {
console.log("Success `conditional`!");
}
</script>
Regardless of whether there are corner cases where no new line between two blocks works, it is generally suggested that new lines are provided between blocks.
The newline doesn't fix it either, it just adds more <p>
tags inside the list item.
Test functions:
def testBlockInput(self):
""" Test whether script block is ignored. """
script = '''* Testing indented script block
<script>
if (1 < 2 && 3 > 1) {
console.log("Success `conditional`!");
}
</script>'''
parsed_script = '''<ul>
<li>Testing indented script block
<script>
if (1 < 2 && 3 > 1) {
console.log("Success `conditional`!");
}
</script></li>
</ul>'''
self.assertEqual(self.md.convert(script), parsed_script)
def testBlockInput2(self):
""" Test whether script block is ignored. """
script = '''* Testing indented script block
<script>
if (1 < 2 && 3 > 1) {
console.log("Success `conditional`!");
}
</script>'''
parsed_script = '''<ul>
<li>
<p>Testing indented script block</p>
<p><script>
if (1 < 2 && 3 > 1) {
console.log("Success `conditional`!");
}
</script></p>
</li>
</ul>'''
self.assertEqual(self.md.convert(script), parsed_script)
Output:
======================================================================
FAIL: testBlockInput (test_apis.TestMarkdownBasics)
Test whether script block is ignored.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/nathanielclark/Downloads/markdown-3.6/tests/test_apis.py", line 77, in testBlockInput
self.assertEqual(self.md.convert(script), parsed_script)
AssertionError: '<ul>[60 chars]f (1 < 2 && 3 > 1) {\n [85 chars]/ul>' != '<ul>[60 chars]f (1 < 2 && 3 > 1) {\n console.log([60 chars]/ul>'
<ul>
<li>Testing indented script block
<script>
- if (1 < 2 && 3 > 1) {
+ if (1 < 2 && 3 > 1) {
- console.log("Success <code>conditional</code>!");
? ^^^^^^ ^^^^^^^
+ console.log("Success `conditional`!");
? ^ ^
}
</script></li>
</ul>
======================================================================
FAIL: testBlockInput2 (test_apis.TestMarkdownBasics)
Test whether script block is ignored.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/nathanielclark/Downloads/markdown-3.6/tests/test_apis.py", line 98, in testBlockInput2
self.assertEqual(self.md.convert(script), parsed_script)
AssertionError: '<ul>[64 chars]f (1 < 2 && 3 > 1) {\n co[79 chars]/ul>' != '<ul>[64 chars]f (1 < 2 && 3 > 1) {\n console.log("Suc[54 chars]/ul>'
<ul>
<li>
<p>Testing indented script block</p>
<p><script>
- if (1 < 2 && 3 > 1) {
+ if (1 < 2 && 3 > 1) {
- console.log("Success <code>conditional</code>!");
? ^^^^^^ ^^^^^^^
+ console.log("Success `conditional`!");
? ^ ^
}
</script></p>
</li>
</ul>
----------------------------------------------------------------------
My snippet file: (the base_mkdocs_init.sls:packages inserts a list from the salt file without the closing brackets)
``` {.bash .copy id="pip_install_list" title=""}
pip install Markdown markdown-include mkdocs mkdocs-exclude mkdocs-material mkdocs-material-extensions mkdocs-mermaid2-plugin mkdocstrings mkdocstrings-python Pygments pymdown-extensions PyYAML
```
<script>
code_block_node = document.getElementById("__codelineno-pip_install_list-1").parentElement;
pip_list = [
--8<-- "base_mkdocs_init.sls:packages"
];
if (pip_list.length > 1){
code_block_node.innerHTML =
"pip install " + pip_list.join(' ')
}
</script>
My markdown to insert this snippet:
4. Run the following command from your terminal to install all required modules.
--8<-- "mkdocs_pip_install.md"
Also, I do use md_in_html
and tried using <script markdown='off'>
, but that doesn't work either.
You are using an indented code block (not fenced as you claim) nested in a list item. That means you need 2 levels of indent: 1 for the nesting and a second for the code block. However, you only have one level of indent (4 spaces). 2 levels would require 8 spaces of indent.
* Testing indented script block
<script>
if (1 < 2 && 3 > 1) {
console.log("Success `conditional`!");
}
</script>
The code block is only for displaying the pip install
command. The script
element is supposed to be executed to replace the contents of the code block with an updated command, but the JavaScript is breaking because the >
comparator is changed to >
.
Yes, HTML blocks aren't handled properly when not at root level. They are recognized, but not always treated the same as they are at document root level. This is unfortunately just the way Python Markdown works currently. Inline HTML is handled fine while nested under other constructs, but block elements will often have their content parsed as Markdown in these circumstances.
Ah, so you want your <script>
tag to be treated as block-level raw HTML. Note that the Markdown rules state:
The only restrictions are that block-level HTML elements — e.g.
<div>
,<table>
,<pre>
,<p>
, etc. — must be separated from surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces. Markdown is smart enough not to add extra (unwanted)<p>
tags around HTML block-level tags.
Pay particular attention to the phrase "the start and end tags of the block should not be indented with tabs or spaces." This effectively means that block-level HTML cannot be nested because they must begin with the first character of a line. In other words, to follow this rule, the parser intentionally does not allow your desired behavior. So @facelessuser you are incorrect when you state that "HTML blocks aren't handled properly when not at root level." This is the correct behavior, which admittedly is counterintuitive. But we didn't write the rules, we just follow them.
So @facelessuser you are incorrect when you state that "HTML blocks aren't handled properly when not at root level.
Fair enough. I tried to express that there is a restriction, but I didn't really stress that it is a rule-based restriction.
Ok. Unfortunately PyMdown Extensions snippet maintains the indentation and doesn't have an option to strip the indentation (selectively or otherwise), so I'll need to change the JavaScript to survive being parsed by Markdown.
I believe have encountered this issue, which led to an immensely confusing behaviour in the mkdocstrings
library. I am quite concerned, as this behavior allows for XSS attacks for sites that allow markdown input, assuming that no code execution can occur. Personally, I don't think it's safe to allow script tags to be included in the output HTML at all, at least by default.
That is an interesting issue, but almost the opposite to the one in this thread.
You could propose for mkdocstrings to have an option to auto-escape <script>
.
For providing detection and warnings for unclosed HTML tags, I wonder if that would be more a task for MkDocs than for Markdown.
@MaddyGuthridge XSS is a real and serious issue for Markdown, but it outside the scope of a Markdown parser to handle. The issue is explained in detail by Michel Fortin in Markdown and XSS. As demonstrated there, not allowing script tags is not a reasonable solution. However, that is a very separate issue from this one as @nbanyan pointed out. Please, let's keep this discssion on-topic.