pugjs/pug-runtime

pug_escape is heavy

Opened this issue · 17 comments

Hi team,
I think pug_escape is the most used function of the runtime and is heavy, this is a lighter and faster alternative:

function pug_repchar(c){
  return c === '"' ? '"'
       : c === '&' ? '&'
       : c === '<' ? '&lt;' : '&gt;'
}

function new_escape(_html){
  return _html && String(_html).replace(pug_match_html, pug_repchar)
}

With this strings:

var teststr = [null, '', 'go >', 'foo', ' bar', 'some long string with nothing', 'father & son',
               'another long string with entities in "this"']

this is the output of benchmark.js

alberto@amarcruz /mnt/Data/Proyectos/_riot/riot $ node test/_new-test.js 
new_escape x 2,100,311 ops/sec ±1.19% (83 runs sampled)
pug_escape x 1,648,968 ops/sec ±1.14% (86 runs sampled)
Fastest is new_escape
alberto@amarcruz /mnt/Data/Proyectos/_riot/riot $ node test/_new-test.js 
new_escape x 2,067,200 ops/sec ±1.14% (86 runs sampled)
pug_escape x 1,626,395 ops/sec ±1.21% (85 runs sampled)
Fastest is new_escape
alberto@amarcruz /mnt/Data/Proyectos/_riot/riot $ node test/_new-test.js 
new_escape x 2,089,361 ops/sec ±1.20% (86 runs sampled)
pug_escape x 1,636,246 ops/sec ±1.19% (85 runs sampled)
Fastest is new_escape
alberto@amarcruz /mnt/Data/Proyectos/_riot/riot $ node test/_new-test.js 
new_escape x 1,823,814 ops/sec ±1.16% (89 runs sampled)
pug_escape x 1,494,807 ops/sec ±1.37% (86 runs sampled)
Fastest is new_escape
alberto@amarcruz /mnt/Data/Proyectos/_riot/riot $ node test/_new-test.js 
new_escape x 1,835,993 ops/sec ±1.18% (84 runs sampled)
pug_escape x 1,510,148 ops/sec ±1.22% (88 runs sampled)
Fastest is new_escape

Hi @aMarCruz
thanks for your input. It seems that v8 has optimized String.prototype.replace since we last discussed this function: #8

Your strings are a little biased towards needing escaping, skewing the result. We anticipate that the vast majority of input does not need escaping (see previous discussions).

I've created a jsperf to use the same strings that we used in the past: https://jsperf.com/pug-escape-perf
You could also use your own, but I'd suggest to add more longer strings that need no escaping.
Two things stand out:

  1. On Chrome/v8, your suggestion is 70% faster for input that needs escaping, but 50% slower on input that does not need escaping.
  2. On Firefox, your suggestion 30-50% slower in both cases

My first impression would be to keep the current implementation unless you see further improvements that maintain the speed for input that does not need escaping. For example, you could simplify modify the existing pug function to start as it currently does (running .exec to quick-scan for a match at position i), but then replacing the for loop with a simple html[0..i-1] + html[i..-1].replace(...). Might be interesting to see what numbers that would give.

@alubbe , I was thinking about size as well, but you're right.

And running .exec does some difference, but using partial replace() slows the function; because the additional concatenation? Anyway, trying micro-op this increases complexity with no guarantee it will work in another browser.

Thanks, closing.

Thanks both of you for putting the time into this. It is crucial that we get the optimisation of the escape function right. I'm pleased that we think it's currently about as good as we can make it.

Any time - I love benchmarking and optimizing perf :)

I know this is not really the place to ask, but we have a lot of

pug.escape(null == (pug_interp = a) ? "" : pug_interp)

in the pug generated javascript. would it make sense to have pug.escape handle the null case and simply call pug.escape(a) ? The generated code would be smaller + it could maybe be optimized better inside the function than for every ternary invocation.

as an example,

- var s = "hello";
p #{s} world, #{s} !

generates

var s = "hello";
pug_html = pug_html + "\n\u003Cp\u003E" + pug.escape(null == (pug_interp = s) ? "" : pug_interp) + " world, " + pug.escape(null == (pug_interp = s) ? "" : pug_interp) + " !\u003C\u002Fp\u003E";

which could become

var s = "hello";
pug_html = pug_html + "\n\u003Cp\u003E" + pug.escape(s) + " world, " + pug.escape(s) + " !\u003C\u002Fp\u003E";

an optional boolean on escape could also be used to keep a "pure" escaping

we could also consider that null values should not be treated with special code, but that is another story ;-)

It is absolutely your place to ask. I think this is well worth considering. I can't think of any cases where we call escape with null or undefined and don't want it to return the empty string. We should do this (could speed things up quite a bit).

The only place to change in pug-code-gen would be

https://github.com/pugjs/pug-code-gen/blob/master/index.js#L663

  val = 'null == (pug_interp = '+val+') ? "" : pug_interp';
  if (code.mustEscape !== false) val = this.runtime('escape') + '(' + val + ')';

into

  if (code.mustEscape !== false) {
    val = this.runtime('escape') + '(' + val + ')';
  } else {
    val = 'null == (pug_interp = '+val+') ? "" : pug_interp';
  }

and find an optimized way to add the null test in escape. @alubbe do you know a simple way to benchmark pug as a whole ?

The simplest I can think of would be to clone & extend these end-to-end benchmarks: https://github.com/alubbe/jade-ejs-bench
All it needs is a new 'pug' folder and a 'pug.js' file (essentially the same as 'jade.js'). Then you can run the current pug version vs. a linked local version.

I re-open this issue instead of opening a new one.
I looked a bit a the backward compatibility of this (before looking at the benchmark).

my proposed patch on pug.escape would be

function pug_escape(_html){
  if (_html == null) return '';
  var html = '' + _html;
  var regexResult = pug_match_html.exec(html);
  if (!regexResult) return html;
  ...

instead of

function pug_escape(_html){
  var html = '' + _html;
  var regexResult = pug_match_html.exec(html);
  if (!regexResult) return _html;

where if (_html == null) return ''; is used to transform null and undefined into an empty string.
and where if (!regexResult) return html; would always return the string cast of the input since I personnaly expect pug.escape to return a string.

When combined with the proposed patch on pug-code-gen this has one side-effect because of how pug-attrs compiles attributes.

It breaks one test :

- var bar = null
div(foo=null bar=bar)&attributes({baz: 'baz'})

where we currently expect

<div baz="baz"></div>
null seems to be used an attribute deleter and I don't think we can use an empty string as an attribute deleter. This would lead me to propose an additional param to pug_escape

These attributes are currently compiled to

(pug.attrs(pug.merge([{"foo": null,"bar": pug.escape(bar)},{baz: 'baz'}]), false))

while my code modifications lead to

(pug.attrs(pug.merge([{"foo": "","bar": pug.escape(bar)},{baz: 'baz'}]), false))

We could fix this either by

  • modifying the way pug-attrs works or
  • by adding a parameter to pug_escape for the cases when the null/undefined escaped value has a meaning
  • by having 2 separate functions for escaping attributes & content if that makes sense.

what do you think ? I will try and use the benchmark mentioned by @alubbe to see if it makes a difference.

@jeromew you're right, currently pug_escape is returning falsy values as-is, something that surprised me. In Riot falsy values that goes to the output returns an empty string (except zero) and the attribute/class is suppressed.

Also, small micro-op is using String() instead concatenation. If _html is already string, the String() becomes a noop. So, to always return string:

function pug_escape(_html){
  if (_html == null) return '';
  var html = String(_html);       // avoid concatenation
  var regexResult = html && pug_match_html.exec(html);  // avoid exec if empty str
  if (!regexResult) return html;  // returns '0', 'Infinity', etc. instead the raw value
...

I tried to benchmark using https://github.com/alubbe/jade-ejs-bench. Here are my results

with modifications
mixin-escape: 17812.982ms
mixin-no-escape: 8774.625ms
no-mixin-escape: 13304.149ms
no-mixin-no-escape: 4968.991ms
simple: 93.696ms

current pug
mixin-escape: 18003.522ms
mixin-no-escape: 8474.700ms
no-mixin-escape: 13556.448ms
no-mixin-no-escape: 5002.959ms
simple: 92.970ms

so it seems that the escaped version wins a little bit by approx 1%.
I do not know if this is sufficient to say something.

The modification on pug-code-gen (removing the interp) leads to less code and I think can only be a win. I would have liked to compare with jsperf the difference it makes on pug_escape only but it seems there is a bug currenly on jsperf and I cannot modify the benchmark.

BTW @alubbe there is a bug in the benchark in pug_escape_new2 which uses pug_repchar instead of pug_repchar2.

The version i used with jade-ejs-bench was

function pug_escape(_html){
  var html = String(_html);
  var regexResult = pug_match_html.exec(html);
  if (!regexResult) return (_html == null) ? '' : html;
  ...

I think that all versions posted here yield roughly the same numbers, so really we are discussing reducing the size of the generated code. I'm in favour of that, but

I can't think of any cases where we call escape with null or undefined and don't want it to return the empty string.

So now we know - null may denote a deleted/obsolete attribute.
In addition to the three options, isn't there a fourth one?

  • Keep the existing behaviour by having pug_escape by returning null/undefined when _html is null/undefined

@alubbe in your 4th option, a template with the simplified code, pug.escape(a) will print "null" when null is the value. That is acceptable to me but may be a hard sell, backward compatibility wise. What do you think ?

You're right, we can't really reconcile that with the current behaviour. Essentially, today we are using two different versions of pug_escape. One returns "" on null/undefined and one returns null/undefined on null/undefined, bringing us back to the three options you described.

Out of options 2 and 3, I prefer having two separate functions over passing a boolean into a single function that then has two different behaviours. E.g.,

function pug_escape_squash_null(html) { return html == null ? "" : pug_escape(html)}

Regarding option 1, what would be a sensible way to change it?

I am not familiar with attributes. If I am not mistaken, we have currently 4 types of code generated

  • A.(pug-code-gen) escaped pug.escape(null == (pug_interp = a) ? "" : pug_interp)
  • B.(pug-code-gen) non escaped (null == (pug_interp = a) ? "" : pug_interp)
  • C.(pug-attrs) html pug.attrs("' + key + '", ' + val + ', ' + stringify(mustEscape) + ', ' + stringify(options.terse) + ')
  • D.(pug-attrs) non-html? pug.escape(val)

I don't see a way where we could not need case B except if we consider that people should not be surprised to see "null"/"undefined" where they pass these. People will need to add this test upstream.

we could have a runtime function pug.squash_null or as you said a pug_escape_squash_null and this would become

  • A.(pug-code-gen) escaped pug.escape(pug.squash_null(a)) or pug.escape_squash_null(a))
  • B.(pug-code-gen) non escaped pug.squash_null(a)
  • C.(pug-attrs) html pug.attrs("' + key + '", ' + val + ', ' + stringify(mustEscape) + ', ' + stringify(options.terse) + ')
  • D.(pug-attrs) non-html? pug.escape(val)

Regarding option 1, what would be a sensible way to change it?

if pug.escape is modified to return '' instead of null, we will need to modifiy the attr genereated code to intercept null/undefined before. That would basically mean that we move some kind of ``(null == (pug_interp = a) ? "" : pug_interp)` from content to attrs. I am not sure that would be a gain ; it is hard to know if we have more dynamic attrs or content.

I am a bit lost here as to where I propose that we go ;-)
Maybe the status quo is the best way to go until further thought has been put into this. It could be interesting for example if someone can take/find time to make an XSS documentation of pug to clarify what is the best practice, and what we are doing.

Indeed, the options I see right now are to add pug.squash_null (not a great improvement over the status-quo) or to re-evaluate how attributes get merged. @ForbesLindesay @TimothyGu any ideas?

The pug-attrs non-html mode is also the reason why pug.escape returns the original value if it's not a string but didn't need escaping. This lets you pass non-string attributes to a mixin and have them retain the same type within the mixin. e.g.

mixin foo()
  - assert(typeof attributes.bar === 'number');

+foo()(bar=10)

As for handling of null, I'm not sure. I think it probably makes sense to optimise for the most common cases which I think should be:

  1. Escaped buffered expressions (i.e. = someExpression)
  2. Escaped attributes (i.e. div(foo=someExpression))

It may be fine to do something that is not as fast or not as little code generated for the other cases.