babel/minify

Do not escape characters in string literals when they are supported by the specified encoding

tyrak opened this issue · 8 comments

tyrak commented

I was experimenting with babili and found that the minified code it produces is significantly larger than closure (100 KiB vs 120 KiB). As it turns out, the problem is caused by the way babili handles (unicode) string literals.

Suppose that the code contains the string "теѕт" (all Cyrillic characters). Then babili converts it to "\u0442\u0435\u0455\u0442". OTOH, closure with the --charset utf8 option leaves the string in the original form. In fact, with that flag, closure converts "\u0442\u0435\u0455\u0442" to "теѕт".

So I propose to introduce to babili an option similar to closure's --charset. Of course, this should use a conservative setting by default (eg ascii), because otherwise the minified script would then require to be loaded with charset="..." in the <script> tag.

j-f1 commented

Good idea!

Internally, there could be a cost function: string => number. For UTF-8 and UTF-16, the cost of теѕт would be 8:

UTF-8 : D1 82 D0 B5 D1 95 D1 82
UTF-16: 04 42 04 35 04 55 04 42

For ASCII, it would be 24:

\u0442\u0435\u0455\u0442
123456789012345678901234
        10        20  24

Babili could potentially run the script through all of the encodings to determine which is shortest.

How do I disable this string-literal-mangling feature? It's bloating my data-heavy files by a significant amount.

This file balloons up monstrously when run through babel-minify: https://github.com/TehShrike/majority-text-family-35-revelation/blob/master/revelation.json

Any chances it will be added before v1.0? Significantly increases the bundle size by escaping UTF-8 chars in strings.

Cyp commented

This seems to be something babel-core is doing, rather than babel-minify. If using the 'minify' preset, it sets minified: true by default.

Passing

{"presets": ["minify", "env"], "minified": false}

instead of

{"presets": ["minify", "env"]}

seems to be a workaround, although it results in more spaces in the output.

tyrak commented

It shouldn't matter what the output of babel-core is. If babel-minify sees in its input a sting literal like "\u0442\u0435\u0455\u0442", and if it was told in its configuration that utf-8 output is acceptable, it should simply output "теѕт".

I understand that you are only suggesting a workaround, I just wanted to clear up any possible misconception that this may not be a babel-minify bug.

Actually minify makes code larger for utf-8 files, for now you can use this:
code = code.replace(/\\u([\d\w]{4})/gi, (m, g) => String.fromCharCode(parseInt(g, 16)))

This seems to be something babel-core is doing, rather than babel-minify.

Your workaround helps, however, babel-minify does this also, if babel-core did not. At least, when used as part of babel-minify-webpack-plugin, when minification becomes a separate stage...