dotnet/roslyn

Allow for a way to ensure some source generators run before / after others

pranavkm opened this issue Β· 84 comments

We briefly discussed this during 6.0 / C# 10 but ultimately punted it since the ask came in too late. We've had issues where user-written source generators can no longer observe the output of source generated by Razor's:

For 6, the solution we gave our users is to turn off the source generator and fall back to our 2-phase compilation process. We'd made some progress discussing this internally, but filing this for C# 11 consideration.

I really want to use this feature with Blazor but seems like I have to rely on AOP frameworks instead.

jcouv commented

Another instance of this problem which was reported is using RegexGenerator in Blazor project.

Yeah, I've been trying to build a source generator to help with Blazor projects, but I think what's happening is that it needs to run before the generator that transforms the .razor files so that when their generator runs the code I've generated that's used in them is available

+1 for this.

We've been using generators for Blazor to generate a strongly typed class that contains all routes of the application so routing isn't stringly typed and much less prone to human errors. This relies on us having to us finding route attributes on the generated files by the razor compiler. Current workaround for us is to define the RouteAttribute in a codebehind so our source generator can see it, but ideally it'd be nice to keep using the @page directive for this.

I think we might want to get some traction or at least ideas going on how this might be be achieved.

In general I see that there are at least two ways this could work.

  1. Re-run generators on the output of all other generators until a steady state is achieved

This would probably be the easiest way to implement this. However this already sounds like a really bad idea in case you build two generators that always trigger each other, leading to an infinite loop. This would however allow generators to use each other, with the possibility of terminating after a maximum amount of round trips.

  1. Actually order generators properly.

The way this could work is by having introducing for example an interface

public interface IOrderedGenerator
{
   IEnumerable<string> GetBeforeGenerators();
   IEnumerable<string> GetAfterGenerators();
}

With the returned strings understood as assembly names.

This would allow a generator to declare both before and after targets.

  • Before generators is relevant for generators that themselves generate attributes that are understood by other generators.
  • After generators make sense in the context of generators like the Razor Source generator that create code from additional files

This method would also make it easy to detect loops, allow parallel execution of generators that aren't part of the same dependency graph, etc.

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

This will prevent possible parallelization, I think

Suppose GeneratorA should run after Generator B, but GeneratorC doesn't have any dependency.
It should be possible to start running both GeneratorA and GeneratorC in parallel.

True. I hadn't considered that. You could group/nest sources thought right?

<Generators>
  <Step>
    <Source Type="GeneratorA" />
    <Source Type="GeneratorB" />
  </Step>
  <Step>
    <Source Type="GeneratorC" />
  </Step>
</Generators>

Each step being parallel within but being sequential adjacent. So A and B parallel together and C after

Edit: sorry realized I messed up your example a bit. Here A and B have no dependencies but C does for example

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

The problem I think is that this firmly puts the burden on the developer using the generator to know what it generates and how the dependencies between generators might play out.

I think the pit of success is rather that generators themselves define if they handle output from other generators or generate output that other generators (possibly dependencies) consume. A developer wouldn't really want to worry about these specific details and just expects to be able to use source generators in a way that they can build on one another without needing an extra step.

Also this would introduce another point where this needs to be validated etc.

I'm not seeing any mechanism for this sort of system to be fast enough. Just having a single level of generators today, is enormously impactful for perf. Even for things that used to be fast (like creating a ref-assembly) we now have to fully bind and create the initial compilation so we can run generators. If we then allow generators to run before/after, and we expect those generators to understand the semantics of that prior generated code, I do not see how we could do that without just huge perf hits.

I liked the idea of the dev specify the order in .csproj. But, sometimes it could be boring, special if the dev is using a package that he doesn't know that it has code being generated.
What about if we have both options? The default behavior is the same that we have now. The projects in .csproj will execute without order. But, if we want to execute it following an order, we may specify the order in .csproj like mentioned by @sanchez #57239 (comment)?

I'm not sure ordering would ever be possible. Take the Razor generator. There are a decent amount of generators that would need to look at the code in the razor files while also letting those razor files consume the generated code. I think that leaves multi-pass runs as the only option.

I suspect, maybe with hopeless optimism, that if this multi-pass behavior was restricted to incremental generators that the output could be examined rather quickly to determine if more runs are required, with a failsafe to bail if recursion is detected.

I'm not seeing any mechanism for this sort of system to be fast enough. Just having a single level of generators today, is enormously impactful for perf. Even for things that used to be fast (like creating a ref-assembly) we now have to fully bind and create the initial compilation so we can run generators. If we then allow generators to run before/after, and we expect those generators to understand the semantics of that prior generated code, I do not see how we could do that without just huge perf hits.

Performance is secondary kind of. This is a feature that might have a decent performance hit, but as long as its something that generators themselves have to implement in order for that to matter I think the problem isn't that big.

I liked the idea of the dev specify the order in .csproj. But, sometimes it could be boring, special if the dev is using a package that he doesn't know that it has code being generated. What about if we have both options? The default behavior is the same that we have now. The projects in .csproj will execute without order. But, if we want to execute it following an order, we may specify the order in .csproj like mentioned by @sanchez #57239 (comment)?

I think specifically that the default option being no order doesn't make a lot of sense. If I already know as the author of a generator that it will generate output that another generator has to process I want to be able to declare this in advance.

I'm not sure ordering would ever be possible. Take the Razor generator. There are a decent amount of generators that would need to look at the code in the razor files while also letting those razor files consume the generated code. I think that leaves multi-pass runs as the only option.

I suspect, maybe with hopeless optimism, that if this multi-pass behavior was restricted to incremental generators that the output could be examined rather quickly to determine if more runs are required, with a failsafe to bail if recursion is detected.

I think for razor files specifically this is a difficult situation. Looking at the generated code, I don't think there is a reason why Razor itself would have this sort of circular dependency on the output of the source generator. It might be there and since this is definitely one of the most visible use cases for chaining generators we definitely have to think about it though.

The razor source generator might itself be split into two parts though. The first part generates a file that just has all the attributes and declares a partial class, and the second part implements the actual rendering pipeline. That way you could hook into the middle of the razor generator.

Performance needs to be primary. The perf here affects literally every semantic task at a higher layer.

Performance needs to be primary. The perf here affects literally every semantic task at a higher layer.

Sure, I'm not saying its completely irrelevant, I'm saying that as long as the behavior is opt in, it shouldn't have any impact on current compile times. When there is a generator that needs this behavior its secondary how performant it is, as long as it isn't abysmal.

Sure, I'm not saying its completely irrelevant, I'm saying that as long as the behavior is opt in, it shouldn't have any impact on current compile times.

It can't really be opt-in. If this becomes needed for some generator, then customers will have to enable this multi-pass generation. Which means that all performance will be impacted there. This is not a hypothetical. The perf hit of just a single pass of generators is enormous. This just exacerbates it, especially as it will likely become mandatory for some scenarios to work.

Sure, I'm not saying its completely irrelevant, I'm saying that as long as the behavior is opt in, it shouldn't have any impact on current compile times.

It can't really be opt-in. If this becomes needed for some generator, then customers will have to enable this multi-pass generation. Which means that all performance will be impacted there. This is not a hypothetical. The perf hit of just a single pass of generators is enormous. This just exacerbates it, especially as it will likely become mandatory for some scenarios to work.

Ah, the multi pass scenario is what I consider a bad solution though. I'd much rather have a directed acyclic graph of generators so that only generators that actually depend on one another have to be run in sequence. The rest can continue to run in parallel.

This shouldn't impact anything but the affected generators themselves which might be problematic enough but it shouldn't affect existing generators at all when you can still run them in parallel.

Ah, the multi pass scenario is what I consider a bad solution though. I'd much rather have a directed acyclic graph of generators so that only generators that actually depend on one another have to be run in sequence.

This is still problematic. Just consider the simple case of two generators, where one depends on the other. To actually produce semantics we ahve to do:

  1. Produce initial Compilation-A.
  2. Run generator 'G1' on it to generate Sources-A'
  3. Produce Compilation-B by combining Sources-A' to Compilation-A
  4. Run generator 'G2' on it to generate Sources-B'.
  5. Produce Compilation-C by combining Sources-B' to Compilation-B.
  6. Compile and emit Compilation-C

This is a ton of very expensive work three full compilations are made here.

Ah, the multi pass scenario is what I consider a bad solution though. I'd much rather have a directed acyclic graph of generators so that only generators that actually depend on one another have to be run in sequence.

This is still problematic. Just consider the simple case of two generators, where one depends on the other. To actually produce semantics we ahve to do:

1. Produce initial `Compilation-A`.

2. Run generator 'G1' on it to generate `Sources-A'`

3. Produce `Compilation-B` by combining `Sources-A'` to `Compilation-A`

4. Run generator 'G2' on it to generate `Sources-B'`.

5. Produce `Compilation-C` by combining `Sources-B'` to `Compilation-B`.

6. Compile and emit `Compilation-C`

This is a ton of very expensive work three full compilations are made here.

Right. I'm not saying I have to answer yet on how this can be achieved with good performance, just that it is a feature that's been requested again and again and that, with increasing proliferation of source generators, will only add more value.

I'm wondering whether the compilation that is passed to a second generator really has to be a full compilation or whether there are ways to make this interface more lightweight.

Also my hope would be that incremental generators also reduce the overall amount of calls that need to be made to the source generators. This dependency feature is something I firmly believe shouldn't be added to the old V1 source generators.

Right. I'm not saying I have to answer yet on how this can be achieved with good performance, just that it is a feature that's been requested again and again and that, with increasing proliferation of source generators, will only add more value.

I get teh value side. But the value is only acceptable if it doesn't come with even more negative value from teh perf hit there. As before, this is not speculative. We're still working on the perf problems introduced by generators (including 'incremental generators). We're not even close to back to where we want to be, and that's the current state of things. We can't layer on even more perf sinks when our perf is not even at an acceptable level today.

Also my hope would be that incremental generators also reduce the overall amount of calls that need to be made to the source generators.

Not really sure hwo that would work. But if you can come up with some mechanism to make that possible, that would be interesting.

Right. I'm not saying I have to answer yet on how this can be achieved with good performance, just that it is a feature that's been requested again and again and that, with increasing proliferation of source generators, will only add more value.

I get teh value side. But the value is only acceptable if it doesn't come with even more negative value from teh perf hit there. As before, this is not speculative. We're still working on the perf problems introduced by generators (including 'incremental generators). We're not even close to back to where we want to be, and that's the current state of things. We can't layer on even more perf sinks when our perf is not even at an acceptable level today.

I think that depends on what you deem an acceptable level. I think we can agree however that we want this feature to be as performant as possible. Introducing this feature would introduce a performance hit on certain critical paths, but I don't really see a way to shorten that path either.

Also my hope would be that incremental generators also reduce the overall amount of calls that need to be made to the source generators.

Not really sure hwo that would work. But if you can come up with some mechanism to make that possible, that would be interesting.

There are a lot of things you can do with incremental generators that would minimize the impact of implementing this feature.

For example you wouldn't rerun dependent generators if their input didn't change, and there is nothing stopping you from including their output from the last compilation if their output won't change.

There's also the point that you could run all generators in parallel in the first iteration, then check if any of the generators would have their pipelines triggered and then selectively rerun those, discarding their previous output and creating a new compilation. My gut feeling is that there are a lot of things you can do, but it won't be easy.

What if there was an additional interface that a generator could implement in addition to IIncrementalGenerator that would be given the syntax trees of the code that was added by other source generators for that pass. Given that info the author could then opt-in for additional runs. Ideally the new method would have the ability to cache the pipelines just like the incremental generator, and secondary runs would likewise still have the ability to cache their pipelines too.

What if there was an additional interface that a generator could implement in addition to IIncrementalGenerator that would be given the syntax trees of the code that was added by other source generators for that pass. Given that info the author could then opt-in for additional runs. Ideally the new method would have the ability to cache the pipelines just like the incremental generator, and secondary runs would likewise still have the ability to cache their pipelines too.

Interesting idea, you'd run into the problem that you can't force one generator to run before another one though. So support for handling other generators would have to be built into new generators.

Interesting idea, you'd run into the problem that you can't force one generator to run before another one though. So, support for handling other generators would have to be built into new generators.

I'm thinking the generators with the interface would eventually get the output from the other generators in subsequent runs. The method would just keep getting called until generators all agree they don't need to be rerun (or a failsafe tells them all to move it along).

Generators not implementing the method would be at the whims of Roslyn when they get run, which wouldn't be any worse than the current state.

The method would just keep getting called until generators all agree they don't need to be rerun

I don't see any viable path forward on that. Just running things until a hopeful fixed-point is reached seems like a recipe for disaster (Even with any sort of cap). If things have circular dependencies, they need to be redesigned to not be that way.

Kinda feels either the design allows two generators to work off each others outputs with a fail safe to keep people from being ridiculous, or a strictly ordered system with a design that the Razor generator would almost immediately throw a wrench in.

and creating a new compilation. My gut feeling is that there are a lot of things you can do, but it won't be easy.

Any new compilation creates a linear additional cost on the entire pipeline. e.g. it's as if we're just recompiling the project 'one more time'.

The method would just keep getting called until generators all agree they don't need to be rerun

I don't see any viable path forward on that. Just running things until a hopeful fixed-point is reached seems like a recipe for disaster (Even with any sort of cap). If things have circular dependencies, they need to be redesigned to not be that way.

I agree that circular dependencies are probably not going to work as they are evil (TM) by nature and at least for a first implementation should be avoided. I think running generators until they converge is doomed to fail even if its the naive implementation and probably much easier. I originally proposed it as one of the two solutions just so that it can be discussed, I honestly don't think its a good idea.

Kinda feels either the design allows two generators to work off each others outputs with a fail safe to keep people from being ridiculous, or a strictly ordered system with a design that the Razor generator would almost immediately throw a wrench in.

The razor generator doesn't have to throw a wrench in it necessarily. The only point where the razor generator (I think) needs access to generated code is when it wants to use generated components. So the only non-supported use case I can think of off the top of my hat is generating new components on the fly within .razor files. And even then you could order your generator to run before the razor generator just taking the .razor files directly as input.

and creating a new compilation. My gut feeling is that there are a lot of things you can do, but it won't be easy.

Any new compilation creates a linear additional cost on the entire pipeline. e.g. it's as if we're just recompiling the project 'one more time'.

Right. Worst case scenario is that we're going to have as many compilations as the dependency graphs longest path. This isn't something that can be helped directly. However my hope is that this is an initial penalty on first run and that subsequent runs would be able to shorten that path by making sure to only rerun generators whose inputs have changed, same as how incremental generators cache the results right now.

I think we all agree that we definitely will have to performance test any implementation to make sure that even a short chain of 2-3 generators will not triple compile times right off the bat and keep them tripled forever. Else that might indeed lead to a situation where working with chained generators becomes a chore.

the other way, add Dependency Injection support

`

[Generator]
public class TestCodeSourceGenerator :  IIncrementalGenerator
{
    //the method will be priority operation in all generator.
    public void Install(IServiceCollection services)
    {

    }

    //when all Install finish, do it.
    public void Initialize(IncrementalGeneratorInitializationContext context)
    {
        context.RegisterSourceOutput(workshopTypes, GenerateCode!);
    }
}

[Generator]
public class _3rdPartyTestCodeSourceGenerator : IIncrementalGenerator
{
    //do register service
    //generator internal sort , remove, overwrite by DI
    //only register service
    public void Install(IServiceCollection services)
    {
        services.AddSingleton<BeforeRazorGenerator>();

    }

    public void Initialize(IncrementalGeneratorInitializationContext context) { }
}

`

veler commented

Hi there,

I'm hitting this issue too with a cross-platform project implying WASDK and some XAML. Basically hitting this more precisely: unoplatform/uno#9865

Using Uno Platform, CommunityToolkit.MVVM and ReswPlus in the same project is pretty much impossible.

All these libraries I mentioned above have generators. Let's say I'm using something from CommunityToolkit.MVVM to generate the details of a property Foo in a ViewModel automatically and bind this property to the UI in the XAML. When doing so, I simply can't build my project because Uno Platform generators won't find Foo because it hasn't been generated by CommunityToolkit.Mvvm yet.

Workaround is to put all the XAML in a project A, all the ViewModels in a project B, all the RESW in a project C. But it unnecessarily complexify the solution architecture.

Now, by reading this thread, I do understand the big performance impact concern of sequentially running generators. I like the suggestion from @sanchez here

True. I hadn't considered that. You could group/nest sources thought right?

<Generators>
  <Step>
    <Source Type="GeneratorA" />
    <Source Type="GeneratorB" />
  </Step>
  <Step>
    <Source Type="GeneratorC" />
  </Step>
</Generators>

Each step being parallel within but being sequential adjacent. So A and B parallel together and C after

Edit: sorry realized I messed up your example a bit. Here A and B have no dependencies but C does for example

I wouldn't expect Roslyn to figure out itself which source generator depends on which other. I wouldn't expect source generator developers to set themselves what's their priority. I would instead expect to let the consumer of the source generator (me) defining in what order generator A B C should run. Documentation about this feature would have to clearly state that build performance may be drastically reduced depending on what we're doing. If it's clearly documented that slower build should be expected, I don't see why this wouldn't be an acceptable option.

I think it is pretty bad to let such an important issue fizzle out. Whatever the problems that need to be overcome, there should be a solution. The .NET runtime is using more and more generators, e.g. to solve AOT and trimming problems, and such can now not be used within another generator. Good examples are JSON and REGEX generators.

Another +1 here. I'm maintaining a JS interop library which provides various helpful tools absent in stock .NET (eg, generating bindings based on C# interfaces, type script definitions, etc) and it worked fine prior to .NET 7 where the JS interop layer has been changed and is now using source generators itself. My library also uses source generators, so it can't be migrated to the new interop. And the old interop is going to be deprecated in .NET 8. A dead end.

Source generators are spreading fast across the entire .NET runtime and there will be more such issues with community solutions in various areas (and eventually across .NET's own codebase, I believe); please consider giving the issue more attention.

I like to mention the things you can do with SWIFT macros and property wrappers…
I feel the C# source generator approach is falling way behind when an elementary problem such as cascading generators is not solved.

Can we expect this in .NET 8?

jcouv commented

Can we expect this in .NET 8?

No, this is not planned (Backlog milestone).

I made a helper method to run other source generators manually. In my case it's for [LibraryImport]. However, the code is not the best, and it relies on internals of Roslyn, so it may break in the future.

internal static class GeneratorRunner
{
    public static void Run(
        GeneratorExecutionContext context,
        string hintNamePrefix,
        IEnumerable<ISourceGenerator> generators,
        params SyntaxTree[] syntaxTrees)
    {
        var compilation = context.Compilation
            .RemoveAllSyntaxTrees()
            .AddSyntaxTrees(syntaxTrees);

        GeneratorDriver driver = CSharpGeneratorDriver.Create(generators: generators);

        driver = driver.RunGenerators(compilation);
        var runResult = driver.GetRunResult();

        foreach (var diagnostic in runResult.Diagnostics)
        {
            ReportDiagnostic(diagnostic);
        }

        foreach (var generatedSource in runResult.Results.SelectMany(result => result.GeneratedSources))
        {
            context.AddSource(GetHintName(generatedSource.HintName), generatedSource.SourceText);
        }

        void ReportDiagnostic(Diagnostic diagnostic)
        {
            // There will be an error if we report a diagnostic
            // from a different compilation so we create a new one.
            var newDiagnostic = Diagnostic.Create(
                diagnostic.Descriptor,
                Location.None,
                diagnostic.Severity,
                diagnostic.AdditionalLocations,
                diagnostic.Properties,
                // SimpleDiagnostic class and _messageArgs field are internal.
                // We use Krafs.Publicizer to access them.
                ((Diagnostic.SimpleDiagnostic)diagnostic)._messageArgs
            );

            context.ReportDiagnostic(newDiagnostic);
        }

        string GetHintName(string nestedHintName)
        {
            return hintNamePrefix switch
            {
                _ when hintNamePrefix.EndsWith(".g.cs") => hintNamePrefix[..^".g.cs".Length],
                _ when hintNamePrefix.EndsWith(".cs") => hintNamePrefix[..^".cs".Length],
                _ => hintNamePrefix,
            } + "__" + nestedHintName;
        }
    }
}

Usage example:

public void Execute(GeneratorExecutionContext context)
{
    var hintName = "Bindings.g.cs";

    var syntaxTree = SyntaxFactory.ParseSyntaxTree("""
internal static partial class Bindings
{
    [global::System.Runtime.InteropServices.LibraryImport("_Internal")]
    public static partial void Foo();
}
""",
        encoding: Encoding.UTF8
    );

    var libraryImportGeneratorType = Type.GetType(
        "Microsoft.Interop.LibraryImportGenerator, Microsoft.Interop.LibraryImportGenerator"
    )!;

    var libraryImportGenerator = ((IIncrementalGenerator)Activator.CreateInstance(libraryImportGeneratorType))
        .AsSourceGenerator();

    var generators = new[] { libraryImportGenerator };

    context.AddSource(hintName, syntaxTree.GetText());
    GeneratorRunner.Run(context, hintName, generators, syntaxTree);
}

Can we expect this in .NET 8?

No, this is not planned (Backlog milestone).

I am quite amazed that when asked (Asp.Net team for AOT) even complex features like interceptors are implemented, but this topic, which is equally or even more important for AOT, remains in limbo for more than two years.

@b-straub Interceptors are being investigated as they are likely to drive actual language changes, and we have to understand that space in order to move forward on it. SGs are purely in the compiler, and the ability to run before/after is well understood, but still lacks suitable solutions to the significant problems there.

but still lacks suitable solutions to the significant problems there.

The complexity has been understood, what about the proposed and considered solution?

#68779 (comment)

The complexity has been understood, what about the proposed and considered solution?

All generators are built off of an idea that they operate on a fully-representative versin of the world (called the 'Compilation'). In either SG v1 or SGv2, this means that in order to:

choose another SG and pass my generated files for one more transform.

we would need to create a full new compilation, containing teh existing code and the code you generated, to then run the other SG and get its outputs. Each of these transforms thus then adds a 'full Compilation' step which is enormously (and currently unacceptably) high cost.

As nothing has changed with the costs, or the acceptability of those costs, this is still not a viable solution :(

As an alternative, have you considered making all the generators used internally by .NET accessible as libraries? This way the users will be able to use them in their own generators, which (with a bit of additional hassle) will solve the issue as well.

The thing is, the more you utilize generators internally in .NET runtime, the less useful generators become for end users. You just keep breaking more and more extension points for the community. This is a really frustrating and worrying trend.

You could certainly request that of the dotnet/runtime team :-)

I agree with @elringus I don't think we need a Model where all SG are like in a tree depending on each other, which adds complexity and cost. I think a layered or phase approach is more than enough.

  1. Roslyn built in
  2. MS one's (like razor, json etc)
  3. Community via nuget (they can't depend on others but the earlier ones
  4. User code

Another idea would be to mark the generator in which phase they want to run, but of course this can lead to 'wild west' in the community/nuget space. But that's the same peoblem as with analyzers. And I think the community did a good job making sure the perf overhead is as minimal as possible.

@elringus

Would you take on the task of opening an issue with your idea in the runtime repository? Your suggestion would solve my case and I am sure many others as well.

notour commented

are SG executed sequentially or in parrallel on the same code tree ?

If SG are executed sequentially each SG could have an attribute expressing dependency.
This way the order could be defined compile time like library dependencies.
This method allow more flexibility, order automatically through dependency tree, and error detection like cycling reference, ...

[GeneratorSequenceDependency("....", Order.After | Order.Before)]

If SG are in parallel it is more difficule and i understand the point of @CyrusNajmabadi

@elringus

Would you take on the task of opening an issue with your idea in the runtime repository? Your suggestion would solve my case and I am sure many others as well.

I've actually already asked this via dotnet/runtime#87346 (reply in thread) for the new JS interop, and looks like they actually have plans on something like this, but they are long-term. I'm currently looking for alternative solutions, including switching out of C#/.NET entirely.

You could certainly request that of the dotnet/runtime team :-)

I don't think that language, compiler and runtime live on different planets considering that .NET development is largely driven by MS teams. Similar to how other teams can request new features from the language/compiler, it should be possible the other way around.

Sure, as a developer I can make a request to the runtime, but that won't be the same as a coordinated internal effort to address the problem that more and more AOT enhancements (JSON, RegEx, JSInteropt,...) are inaccessible to SG usage. This is not a small problem. Either the whole SG concept is not well enough thought out, or the negative effects that are now coming to light are not addressed actively enough.

The complexity has been understood, what about the proposed and considered solution?

All generators are built off of an idea that they operate on a fully-representative versin of the world (called the 'Compilation'). In either SG v1 or SGv2, this means that in order to:

choose another SG and pass my generated files for one more transform.

we would need to create a full new compilation, containing teh existing code and the code you generated, to then run the other SG and get its outputs. Each of these transforms thus then adds a 'full Compilation' step which is enormously (and currently unacceptably) high cost.

As nothing has changed with the costs, or the acceptability of those costs, this is still not a viable solution :(

@CyrusNajmabadi

I think as mentioned many times before the solution is to make dependencies explicit. That means that if one source generator has a dependency on another for processing their output it will run before that one. Obviously this means there can't be any loops and it needs to be a strictly acyclic graph.

Compilation would then require as many passes as the longest chain in that tree, which while unfortunate because it would increase compilation times is a fair trade off for the added utility in my opinion and apparently in the opinion of many others as well.

Is there a specific reason why a compilation can't/isn't amended by the new source files instead of having to be done from scratch? I don't understand where the extreme overhead is coming from here in the first place, but I haven't dug too deep into compiler internals.

I know that you can't just run the source generators that come after you only on your output (as a dependent SG) because some source generators need the whole compilation (see SGs for DI frameworks).

Have you done some actual exploration into how it would affect compile times? In the beginning it shouldn't affect any compile times as there is no defined dependency between any existing source generator so far and even once a few come out I don't expect there to be more than 2-3 passes needed in order to run the chain to its end. You can parallelize all SGs in the same stage after all.

are SG executed sequentially or in parrallel on the same code tree ?

They all operate concurrently on the same input, producing independent outputs.

I think as mentioned many times before the solution is to make dependencies explicit. That means that if one source generator has a dependency on another for processing their output it will run before that one. Obviously this means there can't be any loops and it needs to be a strictly acyclic graph.

This doesn't solve the primary concern that has been brought up since the beginning (and which i reiterated in my post). Doing that approach fundamentally means producing N full compilations for any operation. It's already problematic where N=2. Extending that model further isn't currently tenable.

Is there a specific reason why a compilation can't/isn't amended by the new source files instead of having to be done from scratch?

There is no such concept as 'amended by the new source file'. It's unclear what that would even mean or do. If you add new stuff, presumably the intent is for it to be used, so it needs to be visible by the actual rest of the compilation being compiled.

Furthermore, roslyn is a big, immutable, graph. You can't have a fully realized, immutable graph, then suddenly have more items in it.

Finally, a core principle of SGs is that if you took all the code generated, wrote it out to files, then recompiled those files with the original sources, that you would have the same output. If 'amended by the new source files' doesn't impact the rest of hte code, this principle is broken.

I don't expect there to be more than 2-3 passes needed in order to run the chain to its end.

Each pass is already too expensive. Even the single pass we have today is teh source of major problems. :-/

There is no such concept as 'amended by the new source file'. It's unclear what that would even mean or do. If you add new stuff, presumably the intent is for it to be used, so it needs to be visible by the actual rest of the compilation being compiled.

Yeah I've had a discussion on the C# Discord about this as well, I thought that this would somehow be feasible but there are quite a few places where it would change the meaning of what's being done, such as partial classes, static constructors, replacing default constructors, static partial void methods, etc.

I was wondering whether there would be a way to cut down on the amount of work that has to be done for each pass and whether it wouldn't be somehow possible to restrict source generators that require other generators to a tighter set of inputs that could be generated much faster, though its a long shot.

Even right now with SG2 there is a way to skip the run of source generators for cases where the inputs haven't changed. If that check could be made fast enough and if most of the time you would just skip running the intermediary source generator and use the cached generated source it should bring the total time closer to the time of a single run aside from the initial loading/first compile.

I think the problems with time you are referring to are mostly related to design time builds and not release builds?

I'm still wondering also whether having this as an opt-in (shoot-yourself in the foot if you know what you are doing) kind of feature would be nice. With small assemblies the compile overhead should be quite low after all.

Yeah I've had a discussion on the C# Discord about this as well, I thought that this would somehow be feasible but there are quite a few places where it would change the meaning of what's being done, such as partial classes, static constructors, replacing default constructors, static partial void methods, etc.

That's def one way that meaning changes. But also, critically, the entire model of roslyn doesn't support this concept at all. From the above post, we expose a "Full immutable graph" of everything. So there's no way for that graph to be full and complete, but then have a section added to it. How would that work? Either the graph is fully reachable and done, or you could observe that it was not complete and that it changed :)

We solve this in teh SG space by having each step be a full, new, complete graph. Which, as you might imagine, is quite expensive.

Perhaps we could move the model away from thsi. But now we're upending the very foundations of our compilation model in the first place. Note that SGs have already upended this to some extent, and have come with huge costs to show for it. Continuing down that path does not seem more viable, it seems less :(

Perhaps we could move the model away from thsi. But now we're upending the very foundations of our compilation model in the first place. Note that SGs have already upended this to some extent, and have come with huge costs to show for it. Continuing down that path does not seem more viable, it seems less :(

I think the question is what the future of SGs is supposed to be. As some have noted what is especially worrying for them is that parts of the runtime (such as JSInterop) are moving to a new source generator form while the "old-way" of doing things is getting deprecated. However any code coming from an SG does not have access to these features anymore.

One way I could think of is to do what the Razor generator still allows, that is provide a fallback two-stage process, however that is basically what I'm proposing by saying: "opt-in to shooting yourself in the foot with a multistage process".

What they do is they limit their processing to a subset of files they know will impact their final result, this is of course much more difficult with source generators as a whole.

Coming back around to the initial quote, if the runtime itself starts to make more and more use of SGs then it makes sense to me at least that there is a way for other source generators to leverage that. It might be in the form a library that you can import (as discussed above), that you can then just pass your generated source to and that does its own postprocessing, or it could be in the form of some native roslyn chaining where SGs would only have to declare dependencies.

I think the problem of not being able to run these generators off of each other is simply not going away and is only going to become larger as time goes on. Its a great language/compiler feature that SGs even exist so let's try to make them even more useful.

Each pass is already too expensive. Even the single pass we have today is teh source of major problems. :-/

Personally, and this is of course just anecdotal evidence, I never had any issues even with larger projects that had multiple source generators, as long as they were V2s or really fast V1s. Seeing the pass time doubled doesn't sound like a bad idea here, but again I don't have a concrete number to show around.

Are there concrete benchmarks you are referring to when talking about the single pass being a source of major problems?

Each of these transforms thus then adds a 'full Compilation' step which is enormously (and currently unacceptably) high cost.

As nothing has changed with the costs, or the acceptability of those costs, this is still not a viable solution :(

Each pass is already too expensive. Even the single pass we have today is teh source of major problems. :-/

@CyrusNajmabadi

Ok, I understand, it wouldn't be fast.

But it is not logical to talk about "enormously (and currently unacceptably) high cost" for a feature that would be opt-in for those who want it.

It's not like the solutions in the comments suggest that you force everyone to pay this high cost.

Is it not possible to provide the feature and let the users decide if they want to use it or not?

I would gladly run the whole build two times in a row, manually if need be, first Build and then Build again with my SG if that would mean that I could use the feature.

Why don't you let us decide if the high cost is acceptable for us or not? The rest of users could continue to use Build.

I would gladly run the whole build two times in a row, manually if need be, first Build and then Build again with my SG if that would mean that I could use the feature.

If you're manually doing things, you can accomplish this today. Just write manual tools using the Roslyn api that do analysis, and just generate normal source files into some location that you then run with the normal compile.

Why don't you let us decide if the high cost is acceptable for us or not?

Because we have to design, implement, and maintain such a system for likely a decade or more. And what happens in practice is customers say "hey, i started using X, and my performance tanked. I blame you and you need to fix it. or we stop paying" :) And saying: "sorry, you opt'ed into using layered generators, so things will be bad" is not something we find people accept at all in these discussions.

Because we have to design, implement, and maintain such a system for likely a decade or more. And what happens in practice is customers say "hey, i started using X, and my performance tanked. I blame you and you need to fix it. or we stop paying" :) And saying: "sorry, you opt'ed into using layered generators, so things will be bad" is not something we find people accept at all in these discussions.

Strange argument, my sons MacBook Pro 14 compiled my large SG based project approx. 10 times faster than my two yours old high end Wintel Notebook (real measures). A decade is computer wise like a century.

If you're manually doing things, you can accomplish this today.

@CyrusNajmabadi

Can I really write a source generator that reads the C# classes generated from razor files in a Blazor project and adds a method if it is missing?

If I can do that manually today, I would really like to know how? :)

If you're manually doing things, you can accomplish this today.

@CyrusNajmabadi

Can I really write a source generator that reads the C# classes generated from razor files in a Blazor project and adds a method if it is missing?

If I can do that manually today, I would really like to know how? :)

Like razor components are partial classes, I think you can generate an other part with the missing method.

If you're manually doing things, you can accomplish this today.

@CyrusNajmabadi
Can I really write a source generator that reads the C# classes generated from razor files in a Blazor project and adds a method if it is missing?
If I can do that manually today, I would really like to know how? :)

Like razor components are partial classes, I think you can generate an other part with the missing method.

You can only do that if you do the two step compile process for razor files. Normally razor files are compiled with an SG and you can't use the output in any other SG. With the two step process its possible. Its also possible to make a code behind .cs file that SGs can read as normal.

My problem with razor files right now is that I can't even analyze some parts of the razor file before the razor generator even runs.

For example, I really want to build a source generator that looks at all the [Parameter] on a Blazor component and generates some stuff. To opt in Blazor components for my source generator, the easiest thing to do would be.

@attribute [MyAttribute]

However, there is currently NO way that I am aware of to detect an attribute that is created this way. Well, technically, it's still part of attribute list in the semantic model I believe. There's just no way to detect it through a syntax receiver.

The solution I've seen in some projects is to put the attribute on a partial class

[MyAttribute]
public partial class MyComponent

See the tests in this project https://github.com/excubo-ag/Generators.Blazor/blob/main/Tests_Blazor/SetParametersAsyncGenerator/EmptyClass.cs

However, for the use case of my source generator, having to create a code behind file just to apply the attribute simple isn't worth it.

@biegehydra
In one of our source generators we consume all Razor files as AdditionalTexts (with the same filter criteria that RazorSourceGenerator is using). Then we parse those files with a regex instead of consuming syntax or semantic nodes. This way we can handle the most common cases. Because regex is obviously not bullet proof, since it just parses the raw text, we also have an analyzer, which we execute for all "_razor.g.cs" and "_cshtml.g.cs" files. The analyzer checks if what we generated based on the regex matches what we expect. Otherwise it errors out with a link to the relevant known issues page.
Maybe you are lucky as we were and you can get all the information you need from a regex.

This is actually not the only workaround we had to do. In another case we cannot get the type of an expression, if it is dependent on a type generated by another source generator. We then get an IInvalidOperation and just take the assumption that we have a conflict with another source generator, instead of doing the proper checks.

Whilst the workarounds we implemented are quite messy, using many source generators works now quite nicely for us. I am however not sure if it stays this way, since Microsoft is releasing more and more source generators. It is possible that we get in a situation, where we cannot use some of those source generators.

So while "Source generators have to be independent" is a quite reasonable limitation, in practice it unfortunately might not show up as limitation but as buggy behavior instead. In our case it even shows up as silent fails, which is especially nasty.

That this limitation can show up as buggy behavior might actually be a chance. It means, that a solution that is not perfect would still be a big improvement. So instead of creating more full compilations than the current two, generators could get an "estimated" incremental semantic model of just the code added by other source generators. The symbols in this incremental compilation would bind primarily to other symbols in the incremental compilation and secondarily to the initial full compilation. All other code (that was not generated by a source generator) will not be recompiled or rebound. The result is likely quite wrong according to the rules of the language, but the hypothesis is, that it will be good enough for source generators to reason about. Since such an increment considers only the code generated by the source generators, it should be reasonably small and thus fast to create. Source generators would be able to refine the code they already generated and this code could be fed back into all generators, until this converges (which is usually quite soon, as others have already noted). After this step, all the generated code can be feed into the second full compilation, where everything gets correctly bound. If a source generator did generate something wrong based on the incorrect immediate semantic model, this can lead to a compile error or a silent fail. Either way, we are not worse off as we are now. Currently we get an not very helpful error when trying to use the Regex source generator in a Razor page or we a silent fail, as is the case with one of our source generators. So the way it can fail would stay the same, it would just be much less likely to happen.

Hello,

To avoid creating one more proposal/issue I decided to comment here and you all can tell me if I'm wrong or should create a new proposal/issue.

I am currently using the .NET Community Toolkit for MVVM and [ObservableProperty] to generate public properties for fields marked with this attribute. I also have a wish to use Mapperly, a source generator for generating object mappings.

However as I noticed, Mapperly is not able to detect the public properties that are generated by the MVVM Community Toolkit. My initial assumption was that this might be an issue with Mapperly, so I opened an issue which was closed as this was an issue with Roslyn:
riok/mapperly#883

So, would the work related to this feature, enable Mapperly to find the public properties generated by the Toolkit?