sintaxi/harp

White Space is Stripped from HTML

linusbobcat opened this issue ยท 5 comments

It seems that Harp strips all white space indentation from the HTML when it either serves or compiles web pages. Although empty spaces where EJS tags used to exist are strangely preserved.

This is typically a non-issue as compiled HTML isn't supposed to be directly interacted with. However, it's also removing all the indentation in my <pre> and <code> tags.

I noticed some commented out CLI flags to preserve white space and indentation, and while they don't point anywhere, I was wondering if it were possible to enable them somehow? I would rather not manually edit my HTML after compilation.

Additional details can be provided if necessary.

And regardless of everything, thank you for maintaining Harp.

First of all, thank you for taking care of Harp!

We recently updated to the new Harp and unfortunately, this is a serious problem for us as we have many code examples with <pre> and <code> tags that need precise formatting to be copy-pasteable. Updating it manually is not really feasible without a significant time commitment.

Do you know what could have caused the issue and if there is some potential workaround?

Hmm, A bunch of redundant minification that didn't provide significant value to harp got removed. Are you able to provide me an example of one of your templates so I can have a look?

There doesn't seem to be anything specific to using particular layouts, templates, or EJS features.
Compiling the following index.ejs results in the following:

Original

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <title>test</title>
    </head>
    <body>
        <h1>a header</h1>
        <p>a paragraph</p>
        <% if(locals.test) { %>
        <% }; %>
    <pre>
        <code>
p {
    font-size: 16px;
    font-family: sans-serif;
}
        </code>
    </pre>
    </body>
</html>

Compiled

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>test</title>
</head>
<body>
<h1>a header</h1>
<p>a paragraph</p>

<pre>
<code>
p {
font-size: 16px;
font-family: sans-serif;
}
</code>
</pre>
</body>
</html>

The EJS function is just a dummy, but notice the code and pre tags being squashed, and the blank line where an EJS function took place.

To gbielskiqt, I used a very hacky workabout. I prefixed all my code tags like so

<pre>
<code>
@@ p {
@@    font-color: red;
@@ }
<code>
<pre>

And ran a hacky sed script to remove all the "@@" characters (after compilation), something like so:

for FILE in ./writing/*.html; do
    sed -i '' -e 's/@@/ /' $FILE
done

Obviously, if your actual code has lots of "@@" characters, use something else.

smnsc commented

I also have the same issue, and for very similar reasons I often post <pre><code> snippets for easy copy/pasting.

I worked around it by adding some client-side post processing. I adapted a function I found on SO to re-format the HTML.

The function below takes a HTML string, and returns it as a string of formatted HTML which you can re-insert into your document.

/* Adapted from: https://stackoverflow.com/a/26361620/216104 */
    function formatHtml(htmlString) {
        var div = document.createElement('div');
        div.innerHTML = htmlString.trim();

        const format = function (node, level) {
            var indentBefore = new Array(level++ + 1).join('  '),
                indentAfter = new Array(level - 1).join('  '),
                textNode;

            for (var i = 0; i < node.children.length; i++) {

                textNode = document.createTextNode('\n' + indentBefore);
                node.insertBefore(textNode, node.children[i]);

                format(node.children[i], level);

                if (node.lastElementChild == node.children[i]) {
                    textNode = document.createTextNode('\n' + indentAfter);
                    node.appendChild(textNode);
                }
            }

            return node;
        }

        return format(div, 0).innerHTML;
    }

Took a quick look at it https://github.com/sintaxi/terraform/blob/cbd673212b246e76d64c33a43c9059625640e32c/lib/template/processors/ejs.js#L8

๐Ÿ‘€

Introduced here sintaxi/terraform@13bfd03#diff-7bc4d9d7c5ecce5be75d3e86cf54c1ba33481b10d78426d46290850f2ec17a9bR8

rmWhitespace Remove all safe-to-remove whitespace, including leading and trailing whitespace. It also enables a safer version of -%> line slurping for all scriptlet tags (it does not strip new lines of tags in the middle of a line).