Parsing escaped attributes shouldn't set `raw` property in resulting node

Question

Parsing escaped attributes shouldn't set `raw` property in resulting node

Opened this issue 4 years ago · 1 comments

Example JSX code:

<span data-foo="&quot;" />

The relevant part of the ESTree:

openingElement: Node {
    type: 'JSXOpeningElement',
    attributes: [
    	Node {
        	type: 'JSXAttribute',
	        name: Node {
    		    type: 'JSXIdentifier',
		        name: 'data-foo'
		    },
        	value: Node {
		        type: 'Literal',
		        value: '"',
        		raw: '"&quot;"' // <-- problem is here
        	}
	    }
    ],
    // ...
}

I believe the raw property of the literal is wrong. When transforming such a tree and preserving the literals as-is, some code generators (e.g. astring) will prefer the raw property and emit it as is:

https://github.com/davidbonnet/astring/blob/92d26a05f666fa4f7a3475df67773581c1dff9a0/src/astring.js#L938-L940

This may lead to generated code that looks as follows:

React.createElement(
  "span",
  { "data-foo": '"&quot;"' }
)

Naturally this is wrong, because React and other tools will escape values by themselves. Interestingly enough, escodegen appears to ignore the raw property.

The naive solution would be to nuke all raw properties nested within any JSX node, but that would be a little more than necessary:

<span data-foo={ "bla" } />

Here, the escaping rules are regular JS rules.

My proposal would be to remove raw from all literals that are nested directly below an attribute.

The following situation is not affected:

<span>&</span>

... for the sole reason that text nested in elements are not parsed as literals, but rather as the separate JSXText node type.

Answer 1 · 2020-12-10T19:47:36.000Z

I don't understand why acorn-jsx is transforming HTML entities at all. That is up for the user agent to do, not the JSX parser.

I have in my JSX </> used within an icon. Parsing this within a bundler and then feeding it into the JSX compiler I'm using (not React in my case) is causing the compiler to think there's a literal </> there, when in fact it's simply text intended for the user agent in particular.

What is the use-case of transforming HTML entities? JSX isn't HTML; the original, un-altered text should be put into the AST nodes, not arbitrarily transformed text. HTML entities are not string escapes as per either the Ecmascript standard nor any of the JSX "standards".

This appears to happen even in normal text, too.

<div className={C.icon}><center>&lt;/&gt;</center></div>