mathiasbynens/he

`he.decode()` performance compared to browser-based hack

Closed this issue · 3 comments

Hello,

Thanks for all your hard work on the library and the awesome documentation!

I did a performance test recently between he.decode() and using this trick to use the browser's <textarea> element to do the conversion for me.

Surprisingly, I found that he.decode() was 2x slower for my string than using the browser's textarea. Here is the code I used to run my benchmarks. The <script> src at the top should be changed to point to your he.js script location:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <script src="/he.js"></script>
    <title></title>
</head>
<body>
<script>
    var txtArea = document.createElement("textarea");
    function decodeHtmlSameTxtArea(html) {
        txtArea.innerHTML = html;
        return txtArea.value;
    }

    function decodeHtml(html) {
        var txt = document.createElement("textarea");
        txt.innerHTML = html;
        return txt.value;
    }

    var count = 100;
    var stringToDecode = "hes a&#039;s a&#039;&#039;s a&#039;&#039;&#039;s a&#039;&#039;&#039;&#039;s b&quot;s b&quot;&quot;s b&quot;&quot;&quot;s b&quot;&quot;&quot;&quot;s \\ // &#039;&#039; ::&quot;&quot;&amp;*^ &lt; &gt; &lt;&lt; &gt;&gt;";



    var a = performance.now();
    for (var i = 0; i < count; i++) {
        var decodedString = decodeHtml(stringToDecode);
    }
    var b = performance.now();
    console.info("Time Taken (using new txtarea each time):", (b - a)/1000, 'seconds.');



    var a = performance.now();
    for (var i = 0; i < count; i++) {
        var decodedString = decodeHtmlSameTxtArea(stringToDecode);
    }
    var b = performance.now();
    console.info("Time Taken (using same txtarea):", (b - a)/1000, 'seconds.');


    var a = performance.now();
    for (var i = 0; i < count; i++) {
        var decodedString = window.he.decode(stringToDecode);
    }
    var b = performance.now();
    console.info("Time Taken (using HtmlEntities library function):", (b - a)/1000, 'seconds.');
</script>
</body>
</html>

Just wanted to point out this interesting comparison. It isn't really an issue so you can close this!

Results for 200 iterations:

Time Taken (using new txtarea each time): 0.0018800000000000238 seconds.
Time Taken (using same txtarea): 0.0008049999999999784 seconds.
Time Taken (using HtmlEntities library function): 0.005325000000000017 seconds.

Results using 100,000 iterations:

Time Taken (using new txtarea each time): 1.6731399999999998 seconds.
Time Taken (using same txtarea): 0.6872499999999996 seconds.
Time Taken (using HtmlEntities library function): 3.0460450000000003 seconds.

Testing on Mac 10.11.1: Chrome 48.0.2564.116, Firefox 45.0, and Safari9.0.1. The results were pretty consistent across all browsers except on Safari, the HtmlEntites library test (last test) was 1.6x longer instead of 2x longer than the first test (using new txtarea each time).

You’re comparing apples and oranges — he implements the spec, which not all browsers support fully. Try this test page: https://mathias.html5.org/tests/html/named-character-references/

As for performance, if you see any way to improve it in he without increasing complexity too much, please submit a pull request.

I see Mathias, thanks for the detailed response and sorry for the faulty comparison!