terser/terser

Performance regression for collapse_vars=true with many var defs of complex objects

mlwilkerson opened this issue · 8 comments

Bug report or Feature request?
Bug: 55x performance regression

Version (complete output of terser -V)

terser 3.7.6

Complete CLI command or minify() options used

terser --compress collapse_vars=true --output output.js index.js

terser input

Input is a module with many var defs, where each has a value that is a complex object. Original scenario was a webpack 4 bundle that had imported Font Awesome 5 icons.

This reproduction repo is a more minimal cleaned up demonstration.

Quick link to the input module from that repo.

terser output or error

This version of terser takes about 30 seconds to produce the bundle, depending on hardware. The output is correct, but this is very slow, and becomes significantly slower in other similar real-world scenarios.

Expected result

This should build in <1 second, depending on hardware. I've seen the same code build correctly as quickly as 0.4 seconds in some previous versions of uglify-js and uglify-es that use the same collapse_vars compression option.


I've timed my scenario against several snapshots of the terser code base and found that the performance has changed significantly as work on compress.js:collapse() has been done. For example:

kzc commented

@mlwilkerson Thanks for the report and self-contained repro.

This also affects uglify-js 3.4.0.

Reported upstream: mishoo/UglifyJS#3174

kzc commented

@mlwilkerson Please try out this patch on this and larger real life scenarios and report your findings.

--- a/lib/compress.js
+++ b/lib/compress.js
@@ -1384,7 +1384,12 @@ merge(Compressor.prototype, {
                     extract_candidates(expr.alternative);
                 } else if (expr instanceof AST_Definitions
                     && (compressor.option("unused") || !(expr instanceof AST_Const))) {
-                    expr.definitions.forEach(extract_candidates);
+                    var len = expr.definitions.length;
+                    var i = len - 200; // limit number of trailing variable defs for consideration
+                    if (i < 0) i = 0;
+                    for (; i < len; i++) {
+                        extract_candidates(expr.definitions[i]);
+                    }
                 } else if (expr instanceof AST_DWLoop) {
                     extract_candidates(expr.condition);
                     if (!(expr.body instanceof AST_Block)) {

Without patch:

$ /usr/bin/time bin/uglifyjs slow.js -c
function alpha(){"use strict";var obj42={alpha:"42",beta:"42",gamma:[42,42,[],"42","42"]},obj43={alpha:"43",beta:"43",gamma:[43,43,[],"43","43"]},obj44={alpha:"44",beta:"44",gamma:[44,44,[],"44","44"]};console.log(obj42.alpha),console.log(obj43.alpha),console.log(obj44.alpha)}alpha();
       40.96 real        41.08 user         0.06 sys

With patch:

$ /usr/bin/time bin/uglifyjs slow.js -c
function alpha(){"use strict";var obj42={alpha:"42",beta:"42",gamma:[42,42,[],"42","42"]},obj43={alpha:"43",beta:"43",gamma:[43,43,[],"43","43"]},obj44={alpha:"44",beta:"44",gamma:[44,44,[],"44","44"]};console.log(obj42.alpha),console.log(obj43.alpha),console.log(obj44.alpha)}alpha();
        0.58 real         0.66 user         0.04 sys

Minified output appears to be the same.

Thanks, @kzc. I'm on it.

@kzc I can confirm that this patch solves the problem I've reported. Supporting evidence:

Scenario 1: Webpack 4 bundle

  • imports one icon each from three icon packs
  • 875 var defs, shaken down to 3
  • with the patch
real	0m0.969s
user	0m1.257s
sys	0m0.097s

Scenario 2: Webpack 4 bundle

  • imports one icon each from two icon packs
  • imports collection of all icon objects from a third icon pack
  • 875 var defs, shaken down to 171
  • with the patch
real	0m0.974s
user	0m1.244s
sys	0m0.090s

Scenario 3: angular-cli "hello world"

  • imports one icon from one icon package
  • ~hundreds of var defs shaken down to one

without patch: 145s
with patch: 14s

Scenario 4: real world Angular project

  • tens of thousands of lines of code
  • importing from three icon packs (on the order of 1,000 var defs)
  • tree-shaken down to 51 used icon objects

without patch, deep imports: 55s
without patch, imports from main: 98s
with patch, imports from main: 56s

h/t @devoto13 for running some of these scenarios

kzc commented

@mlwilkerson Thanks for reporting your findings.

Speed and size aside, can I assume that all these minified programs ran correctly?

Yes, that's a safe assumption that they all ran correctly. In my many runs I've not always verified the correctness of the output, but any time that I have spot-checked (maybe 50% of the time), it's always looked correct (properly tree-shaken), and always executed correctly.

To be more certain, I did just re-verify the correctness of the minified programs for Scenario 1 and 2 above. Both ran correctly. I feel highly confident that Scenarios 3 and 4 would also be correct, but I have not verified. @devoto13 can you verify?

@mlwilkerson Yes, both bundles work correctly.

Thanks