atom/language-java

Textmate grammar freezes with long string constant

Opened this issue · 4 comments

From @swbrenneis in microsoft/vscode#79640

Paste the following into a java file:

package org.mytest;

class MyTest {
    void doMyTest() {
        testString =  "A900102008Exx xxxxxxxxxxxxxxxxxxxxxx          T1201801210000P0000000002421000002Exxxxxx                                                     P021000021JPxxxxxxxxxxxxxxxxx                 021000021xxxxxxxxxxxxxxxxxxx                 0000000222100D                                   P021000089xxxxxxxxxxxx                        021000089xxxxxxxxxxxx                        0000000012100C                                   P051000017xxxxxxxxxxxxxxx                     051000017xxxxxxxxxxxxxxx                     0000000020000C                                   P081000032xxxxxxxxxxxxxxx                     081000032xxxxxxxxxxxxxxx                     0000000020000D                                   P121000248xxxxxxxxxxxxxxxx                    121000248xxxxxxxxxxxxxxxx                    0000000210000C                                   ";
    }
}

The grammar will cause a freeze

Thanks for reporting, I will have a look.

I root caused the issue - variable pattern causes the problem, will try fixing it.

Seems to be caused by class:

'class':
'begin': '(?=\\w?[\\w\\s-]*\\b(?:class|(?<!@)interface|enum)\\s+[\\w$]+)'
'end': '}'
'endCaptures':
'0':
'name': 'punctuation.section.class.end.bracket.curly.java'
'name': 'meta.class.java'
'patterns': [
{
'include': '#storage-modifiers'
}
{
'include': '#generics'
}
{
'include': '#comments'
}
{
'captures':
'1':
'name': 'storage.modifier.java'
'2':
'name': 'entity.name.type.class.java'
'match': '(class|(?<!@)interface|enum)\\s+([\\w$]+)'
'name': 'meta.class.identifier.java'
}
{
'begin': 'extends'
'beginCaptures':
'0':
'name': 'storage.modifier.extends.java'
'end': '(?={|implements|permits)'
'name': 'meta.definition.class.inherited.classes.java'
'patterns': [
{
'include': '#object-types-inherited'
}
{
'include': '#comments'
}
]
}
{
'begin': '(implements)\\s'
'beginCaptures':
'1':
'name': 'storage.modifier.implements.java'
'end': '(?=\\s*extends|permits|\\{)'
'name': 'meta.definition.class.implemented.interfaces.java'
'patterns': [
{
'include': '#object-types-inherited'
}
{
'include': '#comments'
}
]
}
{
'begin': '(permits)\\s'
'beginCaptures':
'1':
'name': 'storage.modifier.permits.java'
'end': '(?=\\s*extends|implements|\\{)'
'name': 'meta.definition.class.permits.classes.java'
'patterns': [
{
'include': '#object-types-inherited'
}
{
'include': '#comments'
}
]
}
{
'begin': '{'
'beginCaptures':
'0':
'name': 'punctuation.section.class.begin.bracket.curly.java'
'end': '(?=})'
'contentName': 'meta.class.body.java'
'patterns': [
{
'include': '#class-body'
}
]
}
]
'class-body':

and record:
'record':
'begin': '(?=\\w?[\\w\\s]*\\b(?:record)\\s+[\\w$]+)'
'end': '}'
'endCaptures':
'0':
'name': 'punctuation.section.class.end.bracket.curly.java'
'name': 'meta.record.java'
'patterns': [
{
'include': '#storage-modifiers'
}
{
'include': '#generics'
}
{
'include': '#comments'
}
{
'begin': '(record)\\s+([\\w$]+)(<[\\w$]+>)?(\\()'
'beginCaptures':
'1':
'name': 'storage.modifier.java'
'2':
'name': 'entity.name.type.record.java'
'3':
'patterns': [
{
'include': '#generics'
}
]
'4':
'name': 'punctuation.definition.parameters.begin.bracket.round.java'
'end': '\\)'
'endCaptures':
'0':
'name': 'punctuation.definition.parameters.end.bracket.round.java'
'name': 'meta.record.identifier.java'
'patterns': [
{
'include': '#code'
}
]
}
{
'begin': '(implements)\\s'
'beginCaptures':
'1':
'name': 'storage.modifier.implements.java'
'end': '(?=\\s*\\{)' # by design, records cannot extend any other class
'name': 'meta.definition.class.implemented.interfaces.java'
'patterns': [
{
'include': '#object-types-inherited'
}
{
'include': '#comments'
}
]
}
{
'include': '#record-body'
}
]

Commenting both out reduces lag massively

the [\\w\\s]* and \\b(?:class|(?<!@)interface|enum)\\s+ clashes quite badly causing catastrophic backtracking when no class/record is found on a standalone word
combine that with the begin rule is a 0width lookahead with no outside anchoring points
not a good recipe for performance

changing (?=\\w?[\\w\\s-]*\\b(?:class|(?<!@)interface|enum)\\s+[\\w$]+)
to (?=(?>(?!\\b(?>class|(?<!@)interface|enum)\\s)\\w++|[\\w\\s-&&[^cei]]++)*+\\b(?>class|(?<!@)interface|enum)\\s+[\\w$])
helps out alot, but does not remove the lag entirely
and the same for record: (?=\\w?[\\w\\s]*\\b(?:record)\\s+[\\w$]+)
=> (?=(?>(?!\\brecord\\s)\\w++|[\\w\\s&&[^r]]++)*+\\brecord\\s+[\\w$])

image