A Lezer grammar for parsing Python regular expressions with incremental parsing support and TypeScript definitions.
npm i lezer-python-regex- Basic patterns, character classes, quantifiers, groups
- Lookarounds, backreferences, conditionals, alternation
- Inline flags, embedded comments, escape sequences
- Named groups
(?P<name>), atomic groups(?>) - Broad coverage of Python regex syntax (groups, lookarounds, conditionals, inline flags, character classes, octal/hex/unicode escapes, anchors including \A and \Z, possessive quantifiers and atomic groups from Python 3.11+)
import { parser } from "lezer-python-regex";
const tree = parser.parse(`(?P<word>\w+)\s+(?P=word)`);
console.log(tree.toString());import { parser, pythonRegexHighlighting } from "lezer-python-regex";
import { LRLanguage } from "@codemirror/language";
import { HighlightStyle, syntaxHighlighting } from "@codemirror/language";
const pythonRegexLanguage = LRLanguage.define({
parser,
languageData: { name: "python-regex" },
});
const highlightStyle = HighlightStyle.define([pythonRegexHighlighting]);
const extensions = [pythonRegexLanguage, syntaxHighlighting(highlightStyle)];import { parser } from "lezer-python-regex";
import * as terms from "lezer-python-regex";
const tree = parser.parse(`(?P<email>[^@]+@[^@]+)`);
const cursor = tree.cursor();
// Find named groups
cursor.iterate((node) => {
if (node.type.id === terms.NamedCapturingGroup) {
console.log("Named group found:", node);
}
});import { parser } from "lezer-python-regex";
function parseWithErrors(pattern: string) {
const tree = parser.parse(pattern);
const errors: any[] = [];
tree.cursor().iterate((node) => {
if (node.type.isError) {
errors.push({
from: node.from,
to: node.to,
message: `Syntax error at ${node.from}-${node.to}`,
});
}
});
return { tree, errors };
}parser- Lezer parser instancepythonRegexHighlighting- CodeMirror syntax highlighting- Grammar terms - Node type constants for tree navigation
parser.parse(input: string, fragments?: TreeFragment[], ranges?: {from: number, to: number}[]): Treegit clone https://github.com/Sec-ant/lezer-python-regex
cd lezer-python-regex
pnpm install
pnpm build
pnpm testCommands:
pnpm test:run- Run all testspnpm test:ui- Interactive test UI
- Fork the repository
- Create a feature branch
- Add tests in
tests/fixtures/ - Ensure tests pass
- Submit a pull request
MIT
- Inline flags without a scope (e.g.
(?ims)) are parsed but their required position at the start of the pattern (per Python 3.11+) isn’t enforced by the grammar. - Verbose mode semantics (re.X /
(?x))—whitespace skipping and#comments outside character classes—are not modeled; only(?#...)embedded comments are recognized. - Lookbehind fixed-length requirement isn’t validated by the grammar.
- Group existence for numbered/named backreferences isn’t validated by the grammar.