/XmlParser

A Roslyn-inspired full-fidelity XML parser with no dependencies and a simple Visual Studio XML language service

Primary LanguageC#Apache License 2.0Apache-2.0

XmlParser

logo image

Build status NuGet package NuGet package for VS Editor

A Roslyn-inspired full-fidelity XML parser with no dependencies and a simple Visual Studio XML language service.

  • The parser produces a full-fidelity syntax tree, meaning every character of the source text is represented in the tree. The tree covers the entire source text.
  • The parser has no dependencies and can easily be made portable. I would appreciate a high quality pull request making the parser portable.
  • The parser is based on the section of the Roslyn VB parser that parses XML literals. The Roslyn code is ported to C# and is made standalone.
  • The parser is error-tolerant. It will still produce a full tree even from invalid XML with missing tags, extra invalid text, etc. Missing and skipped tokens are still represented in the tree.
  • The resulting tree is immutable and follows Roslyn's green/red separation for maximum reusability of nodes.
  • The parser has basic support for incrementality. Given a previous constructed tree and a list of changes it will try to reuse existing nodes and only re-create what is necessary.
  • This library is more low-level than XLinq (for instance XLinq doesn't seem to represent whitespace around attributes). Also it has no idea about XML namespaces and just tells you what's in the source text (whereas in XLinq there's too much ceremony around XML namespaces).

This is work in progress and by no means complete. Specifically:

  • XML DTD is not supported (Roslyn didn't support it either)
  • Code wasn't tuned for performance and allocations, I'm sure a lot can be done to reduce memory consumption by the resulting tree. It should be pretty efficient though.
  • We reserve the right to accept only very high quality pull requests. We have very limited time to work on this so I ask everybody to please respect that.

Download from NuGet:

Try it!

https://xmlsyntaxvisualizer.azurewebsites.net/index.html

The above app leverages the parser and can help you visualize the resulting syntax tree generated from an XML document.

Code is available at https://github.com/garuma/XmlSyntaxVisualizer C# UWP example at https://github.com/michael-hawker/XmlSyntaxVisualizerUWP

Also see the blog post: https://blog.neteril.org/blog/2018/03/21/xml-parsing-roslyn/

Resources about Immutable Syntax Trees: https://github.com/KirillOsenkov/Bliki/wiki/Roslyn-Immutable-Trees

FAQ:

How to find a node in the tree given a position in the source text?

https://github.com/KirillOsenkov/XmlParser/blob/master/src/Microsoft.Language.Xml/Utilities/SyntaxLocator.cs#L24

SyntaxLocator.FindNode(SyntaxNode node, int position);

How to replace a node in the tree

var original = """
               <Project Sdk="Microsoft.NET.Sdk">
                 <PropertyGroup>
                   <TargetFramework>net8.0</TargetFramework>
                 </PropertyGroup>
               </Project>
               """;

var expected = """
               <Project Sdk="Microsoft.NET.Sdk">
                 <PropertyGroup>
                   <TargetFramework>net9.0</TargetFramework>
                 </PropertyGroup>
               </Project>
               """;

XmlDocumentSyntax root = Parser.ParseText(original);
XmlElementSyntax syntaxToReplace = root
    .Descendants()
    .OfType<XmlElementSyntax>()
    .Single(n => n.Name == "TargetFramework");
SyntaxNode textSyntaxToReplace = syntaxToReplace.Content.Single();

XmlTextSyntax content = SyntaxFactory.XmlText(SyntaxFactory.XmlTextLiteralToken("net9.0", null, null));

root = root.ReplaceNode(textSyntaxToReplace, content);

Assert.Equal(expected, root.ToFullString());