February 2015

Volume 30 Number 2


C# - Adding a Code Fix to Your Roslyn Analyzer

By Alex Turner | February 2015

If you followed the steps in my previous article, “Use Roslyn to Write a Live Code Analyzer for Your API” (msdn.microsoft.com/­magazine/dn879356), you wrote an analyzer that displays live errors for invalid regular expression (regex) pattern strings. Each invalid pattern gets a red squiggle in the editor, just as you’d see for compiler errors, and the squiggles appear live as you type your code. This is made possible by the new .NET Compiler Platform (“Roslyn”) APIs, which power the C# and Visual Basic editing experiences in Visual Studio 2015.

Can you do more? If you’ve got the domain knowledge to see not just what’s wrong but also how to fix it, you can suggest the relevant code fix through the new Visual Studio light bulb. This code fix will allow a developer using your analyzer to not just find an error in his code—he can also clean it up instantly.

In this article, I’ll show you how to add a code fix provider to your regex diagnostic analyzer that offers fixes at each regex squiggle. The fix will be added as an item in the light bulb menu, letting the user preview the fix and apply it to her code automatically.

Picking Up Where You Left Off

To get started, be sure you’ve followed the steps in the previous article. In that article, I showed you how to write the first half of your analyzer, which generates the diagnostic squiggles under each invalid regex pattern string. That article walked you through:

  • Installing Visual Studio 2015, its SDK and the .NET Compiler Platform ("Roslyn") SDK.
  • • Creating a new Analyzer with Code Fix project.
  • Adding code to DiagnosticAnalyzer.cs to implement the invalid regex pattern detection.

If you’re looking to quickly catch up, check out Figure 1, ­which lists the final code for DiagnosticAnalyzer.cs.

Figure 1 The Complete Code for DiagnosticAnalyzer.cs

using System;
using System.Collections.Immutable;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using Microsoft.CodeAnalysis.Diagnostics;
namespace RegexAnalyzer
{
  [DiagnosticAnalyzer(LanguageNames.CSharp)]
  public class RegexAnalyzerAnalyzer : DiagnosticAnalyzer
  {
    public const string DiagnosticId = "Regex";
    internal const string Title = "Regex error parsing string argument";
    internal const string MessageFormat = "Regex error {0}";
    internal const string Category = "Syntax";
    internal static DiagnosticDescriptor Rule =
      new DiagnosticDescriptor(DiagnosticId, Title, MessageFormat,
      Category, DiagnosticSeverity.Error, isEnabledByDefault: true);
    public override ImmutableArray<DiagnosticDescriptor>
      SupportedDiagnostics { get { return ImmutableArray.Create(Rule); } }
    public override void Initialize(AnalysisContext context)
    {
      context.RegisterSyntaxNodeAction(
        AnalyzeNode, SyntaxKind.InvocationExpression);
    }
    private void AnalyzeNode(SyntaxNodeAnalysisContext context)
    {
      var invocationExpr = (InvocationExpressionSyntax)context.Node;
      var memberAccessExpr =
        invocationExpr.Expression as MemberAccessExpressionSyntax;
      if (memberAccessExpr?.Name.ToString() != "Match") return;
      var memberSymbol = context.SemanticModel.
        GetSymbolInfo(memberAccessExpr).Symbol as IMethodSymbol;
      if (!memberSymbol?.ToString().StartsWith(
        "System.Text.RegularExpressions.Regex.Match") ?? true) return;
      var argumentList = invocationExpr.ArgumentList as ArgumentListSyntax;
      if ((argumentList?.Arguments.Count ?? 0) < 2) return;
      var regexLiteral =
        argumentList.Arguments[1].Expression as LiteralExpressionSyntax;
      if (regexLiteral == null) return;
      var regexOpt = context.SemanticModel.GetConstantValue(regexLiteral);
      if (!regexOpt.HasValue) return;
      var regex = regexOpt.Value as string;
      if (regex == null) return;
      try
      {
        System.Text.RegularExpressions.Regex.Match("", regex);
      }
      catch (ArgumentException e)
      {
        var diagnostic =
          Diagnostic.Create(Rule, regexLiteral.GetLocation(), e.Message);
        context.ReportDiagnostic(diagnostic);
      }
    }
  }
}

Transforming Immutable Syntax Trees

Last time, when you wrote the diagnostic analyzer to detect invalid regex patterns, the first step was to use the Syntax Visualizer to identify patterns in the syntax trees that indicated problem code. You then wrote an analysis method that ran each time the relevant node type was found. The method checked for the pattern of syntax nodes that warranted an error squiggle.

Writing a fix is a similar process. You deal in syntax trees, focusing on the desired new state of the code files after the user applies your fix. Most code fixes involve adding, removing or replacing syntax nodes from the current trees to produce new syntax trees. You can operate directly on syntax nodes or use APIs that let you make project-wide changes, such as renames.

One very important property to understand about the syntax nodes, trees and symbols in the .NET Compiler Platform is that they’re immutable. Once a syntax node or tree is created, it can’t be modified—a given tree or node object will always represent the same C# or Visual Basic code.

Immutability in an API for transforming source code may seem counterintuitive. How can you add, remove and replace the child nodes in a syntax tree if neither the tree nor its nodes can be changed? It’s helpful here to consider the .NET String type, another immutable type you use most likely every day. You perform operations to transform strings quite often, concatenating them together and even replacing substrings using String.Replace. However, none of these operations actually change the original string object. Instead, each call returns a new string object that represents the new state of the string. You can assign this new object back to your original variable, but any method you passed the old string to will still have the original value.

Adding a Parameter Node to an Immutable Tree To explore how immutability applies to syntax trees, you’ll perform a simple transform manually in the code editor, and see how it affects the syntax tree.

Inside Visual Studio 2015 (with the Syntax Visualizer tool installed, see previous article), create a new C# code file. Replace all of its contents with the following code:

class C
{
  void M()
  }
}

Open up the Syntax Visualizer by choosing View | Other Windows | Syntax Visualizer and click anywhere within the code file to populate the tree. In the Syntax Visual­izer window, right-click the root CompilationUnit node and choose View Directed Syntax Graph. Visualizing this syntax tree results in a graph like the one in Figure 2 (the graph shown here omits the gray and white trivia nodes that represent whitespace). The blue Parameter­List syntax node has two green child tokens representing its parentheses and no blue child syntax nodes, as the list contains no parameters.

Syntax Tree Before the Transform
Figure 2 Syntax Tree Before the Transform

The transform you’ll simulate here is one that would add a new parameter of type int. Type the code “int i” within the parentheses of method M’s parameter list and watch the changes within the Syntax Visualizer as you type:

class C
{
  void M(int i)
  {
  }
}

Note that even before you finish typing, when your incomplete code contains compile errors (shown in the Syntax Visualizer as nodes with a red background), the tree is still coherent, and the compiler guesses that your new code will form a valid Parameter node. This resilience of the syntax trees to compiler errors is what allows IDE features and your diagnostics to work well against incomplete code.

Right-click on the root CompilationUnit node again and generate a new graph, which should look like Figure 3 (again, depicted here without trivia).

Syntax Tree After the Transform
Figure 3 Syntax Tree After the Transform

Note that the ParameterList now has three children, the two parenthesis tokens it had before, plus a new Parameter syntax node. As you typed “int i” in the editor, Visual Studio replaced the document’s previous syntax tree with this new syntax tree that represents your new source code.

Performing a full replacement works well enough for small strings, which are single objects, but what about syntax trees? A large code file may contain thousands or tens of thousands of syntax nodes, and you certainly don’t want all of those nodes to be recreated every time someone types a character within a file. That would generate tons of orphaned objects for the garbage collector to clean up and seriously hurt performance.

Luckily, the immutable nature of the syntax nodes also provides the escape here. Because most of the nodes in the document aren’t affected when you make a small change, those nodes can be safely reused as children in the new tree. The behind-the-scenes internal node that stores the data for a given syntax node points only downward to the node’s children. Because those internal nodes don’t have parent pointers, it’s safe for the same internal node to show up over and over again in many iterations of a given syntax tree, as long as that part of the code remains the same.

This node reuse means that the only nodes in a tree that need to be recreated on each keystroke are those with at least one descendant that has changed, namely the narrow chain of ancestor nodes up to the root, as depicted in Figure 4. All other nodes are reused as is.

The Ancestor Nodes Replaced During the Transform
Figure 4 The Ancestor Nodes Replaced During the Transform

In this case, the core change is to create your new Parameter node and then replace the ParameterList with a new ParameterList that has the new Parameter inserted as a child node. Replacing the ParameterList also requires replacing the chain of ancestor nodes, as each ancestor’s list of child nodes changes to include the replaced node. Later in this article, you’ll do that kind of replacement for your regex analyzer with the SyntaxNode.ReplaceNode method, which takes care of replacing all the ancestor nodes for you.

You’ve now seen the general pattern for planning a code fix: You start with code in the before state that triggers the diagnostic. Then you manually make the changes the fix should make, observing the effect on the syntax tree. Finally, you work out the code needed to create the replacement nodes and return a new syntax tree that contains them.

Be sure you’ve got your project open, containing the diagnostic you created last time. To implement your code fix, you’ll dig into CodeFixProvider.cs.

FixableDiagnosticIds Property

Fixes and the diagnostics they resolve are loosely coupled by diagnostic IDs. Each code fix targets one or more diagnostic IDs. Whenever Visual Studio sees a diagnostic with a matching ID, it will ask your code fix provider if you have code fixes to offer. Loose coupling based on the ID string allows one analyzer to provide a fix for a diagnostic produced by someone else’s analyzer, or even a fix for built-in compiler errors and warnings.

In this case, your analyzer produces both the diagnostic and the code fix. You can see that the FixableDiagnosticIds property is already returning the diagnostic ID you defined in your Diagnostic type, so there’s nothing to change here.

RegisterCodeFixesAsync Method

The RegisterCodeFixesAsync method is the main driver for the code fix. This method is called whenever one or more matching Diagnostics are found for a given span of code.

You can see that the template’s default implementation of the RegisterCodeFixesAsync method pulls out the first diagnostic from the context (in most cases, you only expect one), and gets the diagnostic’s span. The next line then searches up the syntax tree from that span to find the nearest type declaration node. In the case of the default template’s rule, that’s the relevant node whose contents needed fixing.

In your case, the diagnostic analyzer you wrote was looking for invocations to see if they were calls to Regex.Match. To help share logic between your diagnostic and your code fix, change the type mentioned in the tree search’s OfType filter to find that same InvocationExpressionSyntax node. Rename the local variable to “invocationExpr,” as well:

var invocationExpr = root.FindToken(
  diagnosticSpan.Start).Parent.AncestorsAndSelf()
  .OfType<InvocationExpressionSyntax>().First();

You now have a reference to the same invocation node with which the diagnostic analyzer started. In the next statement, you pass this node to the method that will calculate the code changes you’ll be sug­gesting for this fix. Rename that method from MakeUppercaseAsync to FixRegexAsync, update the title string to "Fix regex" and use this string for the fix description and equivalenceKeyt:

context.RegisterFix(
  CodeAction.Create(title, c => FixRegexAsync(
  context.Document, invocationExpr, c), equivalenceKey: title), diagnostic);

Each call to the context’s RegisterCodeFix method associates a new code action with the diagnostic squiggle in question, and will produce a menu item inside the light bulb. Note that you’re not actually calling the FixRegexAsync method that performs the code transform yet. Instead, the method call is wrapped in a lambda expression that Visual Studio can call later. This is because the result of your transform is only needed when the user actually selects your Fix regex item. When the fix item is highlighted or chosen, Visual Studio invokes your action to generate a preview or apply the fix. Until then, Visual Studio avoids running your fix method, just in case you’re performing expensive operations, such as solution-wide renames.

Note that a code fix provider isn’t obligated to produce a code fix for every instance of a given diagnostic. It’s often the case that you have a fix to suggest only for some of the cases your analyzer can squiggle. If you’ll only have fixes some of the time, you should first test in RegisterCodeFixesAsync any conditions you need to determine whether you can fix the specific situation. If those conditions aren’t met, you should return from RegisterCodeFixesAsync without calling RegisterCodeFix.

For this example, you’ll offer a fix for all instances of the diagnostic, so there are no more conditions to check.

FixRegexAsync Method

Now you get to the heart of the code fix. The FixRegexAsync method as currently written takes a Document and produces an updated Solution. While diagnostic analyzers look at specific nodes and symbols, code fixes can change code across the entire solution. You can see that the template code here is calling Renamer.RenameSymbol­Async, which changes not just the symbol’s type declaration, but also any references to that symbol throughout the solution.

In this case, you only intend to make local changes to the pattern string in the current document, so you can change the method’s return type from Task<Solution> to Task<Document>. This signature is still compatible with the lambda expression in RegisterCodeFixes­Async, as CodeAction.Create has another overload that accepts a Document rather than a Solution. You’ll also need to update the typeDecl parameter to match the InvocationExpressionSyntax node you’re passing in from the RegisterCodeFixesAsync method:

private async Task<Document> FixRegexAsync(Document document,
  InvocationExpressionSyntax invocationExpr, CancellationToken cancellationToken)

Because you don’t need any of the “make uppercase” logic, delete the body of the method, as well.

Finding the Node to Replace The first half of your fixer will look much like the first half of your diagnostic analyzer—you need to dig into the InvocationExpression to find the relevant parts of the method call that will inform your fix. In fact, you can just copy in the first half of the AnalyzeNode method down to the try-catch block. Skip the first line, though, as you already take invocationExpr as a parameter. Because you know this is code for which you’ve successfully found a diagnostic, you can remove all of the “if” checks. The only other change to make is to fetch the semantic model from the Document argument, as you no longer have a context that provides the semantic model directly.

When you finish those changes, the body of your FixRegexAsync method should look like this:

var semanticModel = await document.GetSemanticModelAsync(cancellationToken);
var memberAccessExpr =
  invocationExpr.Expression as MemberAccessExpressionSyntax;
var memberSymbol =
  semanticModel.GetSymbolInfo(memberAccessExpr).Symbol as IMethodSymbol;
var argumentList = invocationExpr.ArgumentList as ArgumentListSyntax;
var regexLiteral =
  argumentList.Arguments[1].Expression as LiteralExpressionSyntax;
var regexOpt = semanticModel.GetConstantValue(regexLiteral);
var regex = regexOpt.Value as string;

Generating the Replacement Node Now that you again have regexLiteral, which represents your old string literal, you need to generate the new one. Calculating exactly what string you need to fix an arbitrary regex pattern is a large task that’s far beyond the scope of this article. As a stand-in for now, you’ll just use the string valid regex, which is indeed a valid regex pattern string. If you decide to go further on your own, you should start small and target your fix at very particular regex problems.

The low-level way to produce new syntax nodes to substitute into your tree is through the members on SyntaxFactory. These methods let you create every type of syntax node in exactly the shape you choose. However, often it proves easier to just parse the expression you want from text, letting the compiler do all the heavy lifting to create the nodes. To parse a snippet of code, just call SyntaxFactory.ParseExpression and specify the code for a string literal:

var newLiteral = SyntaxFactory.ParseExpression("\"valid regex\"");

This new literal would work well as a replacement in most cases, but it’s missing something. If you recall, syntax tokens can have attached trivia that represents whitespace or comments. You’ll need to copy over any trivia from the old literal expression to ensure you don’t delete any spacing or comments from the old code. It’s also good practice to tag new nodes you create with the “Formatter” annotation, which informs the code fix engine that you want your new node formatted according to the end user’s code style settings. You’ll need to add a using directive for the Microsoft.CodeAnalysis.Formatting namespace. With these addi­tions, your ParseExpression call looks like this:

var newLiteral = SyntaxFactory.ParseExpression("\"valid regex\"")
  .WithLeadingTrivia(regexLiteral.GetLeadingTrivia())
  .WithTrailingTrivia(regexLiteral.GetTrailingTrivia())
  .WithAdditionalAnnotations(Formatter.Annotation);

Swapping the New Node into the Syntax Tree Now that you have a new syntax node for the string literal, you can replace the old node within the syntax tree, producing a new tree with a fixed regex pattern string.

First, you get the root node from the current document’s syntax tree:

var root = await document.GetSyntaxRootAsync();

Now, you can call the ReplaceNode method on that syntax root to swap out the old syntax node and swap in the new one:

var newRoot = root.ReplaceNode(regexLiteral, newLiteral);

Remember that you’re generating a new root node here. Replacing any syntax node also requires you to replace its parents all the way up to the root. As you saw before, all syntax nodes in the .NET Compiler Platform are immutable. This replacement operation actually just returns a new root syntax node with the target node and its ancestors replaced as directed.

Now that you have a new syntax root with a fixed string lit­eral, you can walk up one more level of the tree to produce a new Document object that contains your updated root. To replace the root, use the WithSyntaxRoot method on the Document:

var newDocument = document.WithSyntaxRoot(newRoot);

This is the same With API pattern you just saw when calling WithLeadingTrivia and other methods on the expression you parsed. You’ll see this With pattern often when transforming existing objects in the Roslyn immutable object model. The idea is similar to the .NET String.Replace method that returns a new string object.

With the transformed document in hand, you can now return it from FixRegexAsync:

return newDocument;

Your code in CodeFixProvider.cs should now look like Figure 5.

Figure 5 The Complete Code for CodeFixProvider.cs

using System.Collections.Immutable;
using System.Composition;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CodeFixes;
using Microsoft.CodeAnalysis.CodeActions;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using Microsoft.CodeAnalysis.Formatting;

namespace RegexAnalyzer
{
  [ExportCodeFixProvider(LanguageNames.CSharp, Name = nameof(RegexAnalyzerCodeFixProvider)), Shared]
  public class RegexAnalyzerCodeFixProvider : CodeFixProvider
  {
    private const string title = "Fix regex";

    public sealed override ImmutableArray<string> FixableDiagnosticIds
    {
      get { return ImmutableArray.Create(RegexAnalyzerAnalyzer.DiagnosticId); }
    }

    public sealed override FixAllProvider GetFixAllProvider()
    {
      return WellKnownFixAllProviders.BatchFixer;
    }

    public sealed override async Task RegisterCodeFixesAsync(CodeFixContext context)
    {
      var root = 
        await context.Document.GetSyntaxRootAsync(context.CancellationToken)
        .ConfigureAwait(false);

      var diagnostic = context.Diagnostics.First();
      var diagnosticSpan = diagnostic.Location.SourceSpan;

      // Find the invocation expression identified by the diagnostic.
      var invocationExpr =    
        root.FindToken(diagnosticSpan.Start).Parent.AncestorsAndSelf()
        .OfType<InvocationExpressionSyntax>().First();

      // Register a code action that will invoke the fix.
      context.RegisterCodeFix(
        CodeAction.Create(title, c => 
        FixRegexAsync(context.Document, invocationExpr, c), equivalenceKey: title), diagnostic);
    }

    private async Task<Document> FixRegexAsync(Document document, 
      InvocationExpressionSyntax invocationExpr, 
      CancellationToken cancellationToken)
    {
      var semanticModel = 
        await document.GetSemanticModelAsync(cancellationToken);

      var memberAccessExpr = 
        invocationExpr.Expression as MemberAccessExpressionSyntax;
      var memberSymbol = 
        semanticModel.GetSymbolInfo(memberAccessExpr).Symbol as IMethodSymbol;
      var argumentList = invocationExpr.ArgumentList as ArgumentListSyntax;
      var regexLiteral = 
        argumentList.Arguments[1].Expression as LiteralExpressionSyntax;
      var regexOpt = semanticModel.GetConstantValue(regexLiteral);
      var regex = regexOpt.Value as string;

      var newLiteral = SyntaxFactory.ParseExpression("\"valid regex\"")
        .WithLeadingTrivia(regexLiteral.GetLeadingTrivia())
        .WithTrailingTrivia(regexLiteral.GetTrailingTrivia())
        .WithAdditionalAnnotations(Formatter.Annotation);
      var root = await document.GetSyntaxRootAsync();
      var newRoot = root.ReplaceNode(regexLiteral, newLiteral);

      var newDocument = document.WithSyntaxRoot(newRoot);

      return newDocument;
    }
  }
}

Trying It Out That’s it! You’ve now defined a code fix whose transform runs when users encounter your diagnostic and choose the fix from the light bulb menu. To try out the code fix, press F5 again in the main instance of Visual Studio and open up the console application. This time, when you place the cursor on your squiggle, you should see a light bulb appear to the left. Clicking on the light bulb should bring up a menu that contains the Fix regex code action you defined, as shown in Figure 6. This menu shows a preview with an inline diff between the old Document and the new Document you created, which represents the state of your code if you choose to apply the fix.

Trying Out Your Code Fix
Figure 6 Trying Out Your Code Fix

If you select that menu item, Visual Studio takes the new Document and adopts it as the current state of the editor buffer for that source file. Your fix has now been applied!

Congratulations

You’ve done it! In about 70 total lines of new code, you identified an issue in your user’s code, live, as he’s typing, squiggled it red as an error, and surfaced a code fix that can clean it up. You transformed syntax trees and generated new syntax nodes along the way, all while operating within your familiar target domain of regular expressions.

While you can continuously refine the diagnostics and code fixes you write, I’ve found that analyzers built with the .NET Compiler Platform let you get quite a lot done within a short amount of time. Once you get comfortable building analyzers, you’ll start to spot all sorts of common problems in your daily coding life and detect repetitive fixes you can automate.

What will you analyze?


Alex Turner is a senior program manager for the Managed Languages team at Microsoft, where he’s been brewing up C# and Visual Basic goodness on the .NET Compiler Platform (“Roslyn”) project. He graduated with a Master of Science in Computer Science from Stony Brook University and has spoken at Build, PDC, TechEd, TechDays and MIX.

Thanks to the following Microsoft technical experts for reviewing this article: Bill Chiles and Lucian Wischik
Lucian Wischik is on the VB/C# language design team at Microsoft, with particular responsibility for VB. Before joining Microsoft he worked in academia on concurrency theory and async. He's a keen sailor and long-distance swimmer.

Bill Chiles worked on languages (CMU Common Lisp, Dylan, IronPython, and C#) and developer tools most of his career.  He spent the last 17 years in Microsoft's Developer Division working on everything from core Visual Studio features, to the Dynamic Language Runtime, to C#).