Regression in selector parsing: Attribute selectors not parsed correctly

Thank you for this fork of css. It's very helpful to us.

While testing, I think I have found a regression in the parser, introduced with commit 24ed6e7:

Lines 215 to 239 in a0efaee

    
                   /** 
        
                    * replace ',' by \u200C for data selector (div[data-lang="fr,de,us"]) 
        
                    * replace ',' by \u200C for nthChild and other selector (div:nth-child(2,3,4)) 
        
                    * 
        
                    * Examples: 
        
                    * div[data-lang="fr,\"de,us"] 
        
                    * div[data-lang='fr,\'de,us'] 
        
                    * div:matches(.toto, .titi:matches(.toto, .titi)) 
        
                    * 
        
                    * Regex logic: 
        
                    *  ("|')(?:\\\1|.)*?,(?:\\\1|.)*?\1 => Handle the " and ' 
        
                    *  \(.*?,.*?\)  => Handle the () 
        
                    * 
        
                    * Optimization 0: 
        
                    * No greedy capture (see docs about the difference between .* and .*?) 
        
                    * 
        
                    * Optimization 1: 
        
                    *  \(.*?,.*?\) instead of \(.*?\) to limit the number of replace (don't need to replace if , is not in the string) 
        
                    * 
        
                    * Optimization 2: 
        
                    * ("|')(?:\\\1|.)*?,(?:\\\1|.)*?\1 this use reference to capture group, it work faster. 
        
                    */ 
        
                   .replace(/("|')(?:\\\1|.)*?,(?:\\\1|.)*?\1|\(.*?,.*?\)/g, m => 
        
                     m.replace(/,/g, '\u200C') 
        
                   )

The regular expression /("|')(?:\\\1|.)*?,(?:\\\1|.)*?\1|\(.*?,.*?\)/g seems to be too greedy for cases like div[class='foo'],div[class='bar']. For this example, it captures 'foo'],div[class=', leading to an incorrect replacement of the comma that separates the two selectors and is not part of the data selector.

The original regular expression, used before this change (/"(?:\\"|[^"])*"|'(?:\\'|[^'])*'/g) handles this case correctly by matching 'foo' and 'bar'.

Ultimately, the current behavior leads to an incorrect AST, where instead of two selectors, only one (incorrect) selector is listed:

"type": "rule",
"selectors": [
    "div[data-value='foo'],div[data-value='bar']",
],

Expected:

"type": "rule",
"selectors": [
    "div[data-value='foo']",
    "div[data-value='bar']"
],

I have attached a test case demonstrating this. You can extract the archive directly into test/cases/:
case - selectors-attributes.zip

Thanks a lot for the report, I will work on it as soon as I get some time.

Hi @RandomByte , could you check if this PR fixes your problems?

#88
I plan to merge it within a week if I found no issues with it.

@holblin thanks! I can confirm that it resolves all issues we were facing.

The new version is published in npm 👍
Thanks for the report! It was much appreciated :-)

	/**
	* replace ',' by \u200C for data selector (div[data-lang="fr,de,us"])
	* replace ',' by \u200C for nthChild and other selector (div:nth-child(2,3,4))
	*
	* Examples:
	* div[data-lang="fr,\"de,us"]
	* div[data-lang='fr,\'de,us']
	* div:matches(.toto, .titi:matches(.toto, .titi))
	*
	* Regex logic:
	* ("\|')(?:\\\1\|.)?,(?:\\\1\|.)?\1 => Handle the " and '
	* \(.?,.?\) => Handle the ()
	*
	* Optimization 0:
	* No greedy capture (see docs about the difference between .* and .*?)
	*
	* Optimization 1:
	* \(.?,.?\) instead of \(.*?\) to limit the number of replace (don't need to replace if , is not in the string)
	*
	* Optimization 2:
	* ("\|')(?:\\\1\|.)?,(?:\\\1\|.)?\1 this use reference to capture group, it work faster.
	*/
	.replace(/("\|')(?:\\\1\|.)?,(?:\\\1\|.)?\1\|\(.?,.?\)/g, m =>
	m.replace(/,/g, '\u200C')
	)