February 2016

Volume 31 Number 2

[C#]

Customizable Scripting in C#

By Vassili Kaplan

In this article I’ll show you how to build a custom scripting language using C#—without using any external libraries. The scripting language is based on the Split-And-Merge algorithm for parsing mathematical expressions in C# I presented in the October 2015 issue of MSDN Magazine (msdn.com/magazine/mt573716).

By using custom functions, I can extend the Split-And-Merge algorithm to parse not only a mathematical expression, but also to parse a customizable scripting language. The “standard” language control flow statements (if, else, while, continue, break and so on) can be added as custom functions, as can other typical scripting language functionality (OS commands, string manipulation, searching for files and so on).

I’m going to call my language Customizable Scripting in C#, or CSCS. Why would I want to create yet another scripting language? Because it’s an easy language to customize. Adding a new function or a new control flow statement that takes an arbitrary number of parameters takes just a few lines of code. Moreover, the function names and the control flow statements can be used in any non-English language scenario with just some configuration changes, which I’ll also show in this article. And by seeing how the CSCS language is implemented, you’ll be able to create your own custom scripting language.

The Scope of CSCS

It’s fairly simple to implement a very basic scripting language, but brutally difficult to implement a five-star language. I’m going to limit the scope of CSCS here so you’ll know what to expect:

  • The CSCS language has if, else if, else, while, continue and break control flow statements. Nested statements are supported, as well. You’ll learn how to add additional control statements on the fly.
  • There are no Boolean values. Instead of writing “if (a),” you have to write “if (a == 1).”
  • Logical operators aren’t supported. Instead of writing “if (a ==1 and b == 2),” you write nested ifs: “if (a == 1) { if (b == 2) { … } }.”
  • Functions and methods aren’t supported in CSCS, but they can be written in C# and registered with the Split-And-Merge Parser in order to be used with CSCS.
  • Only “//”-style comments are supported.
  • Variables and one-dimensional arrays are supported, all defined at the global level. A variable can hold a number, a string or a tuple (implemented as a list) of other variables. Multi-dimensional arrays are not supported.

Figure 1 shows a “Hello, World!” program in CSCS. Due to a mistyping of “print,” the program displays an error at the end: “Couldn’t parse token [pint].” Note that all the previous statements executed successfully; that is, CSCS is an interpreter.

“Hello, World!” in CSCS
Figure 1 “Hello, World!” in CSCS

Modifications to the Split-And-Merge Algorithm

I’ve made two changes to the Split part of the Split-And-Merge algorithm. (The Merge part remains the same.)

The first change is that the result of parsing an expression can be now a number, a string or a tuple of values (each of which can be either a string or a number), rather than just a number. I created the following Parser.Result class to store the result of applying the Split-And-Merge algorithm:

public class Result
{
  public Result(double dRes = Double.NaN, 
    string sRes = null, 
    List<Result> tRes = null)
  {
    Value  = dResult;
    String = sResult;
    Tuple  = tResult;
  }
  public double
       Value  { get; set; }
  public string
       String { get; set; }
  public List<Result> Tuple  { get; set; }
}

The second modification is that now the splitting part is performed not just until a stop-parsing character—) or \n—is found, but until any character in a passed array of stop-parsing characters is found. This is necessary, for example, when parsing the first argument of an If statement, where the separator can be any <, >, or = character.

You can take a look at the modified Split-And-Merge algorithm in the accompanying source code download.

The Interpreter

The class responsible for interpreting the CSCS code is called Interpreter. It’s implemented as a singleton, that is, a class definition where there can be only one instance of the class. In its Init method, the Parser (see the original article mentioned earlier) is initialized with all the functions used by the Interpreter:

public void Init()
{
  ParserFunction.AddFunction(Constants.IF,
        new IfStatement(this));
  ParserFunction.AddFunction(Constants.WHILE,
     new WhileStatement(this));
  ParserFunction.AddFunction(Constants.CONTINUE,
  new ContinueStatement());
  ParserFunction.AddFunction(Constants.BREAK,
     new BreakStatement());
  ParserFunction.AddFunction(Constants.SET,
       new SetVarFunction());
...
}

In the Constants.cs file, the actual names used in CSCS are defined:

...
public const string IF          = "if";
public const string ELSE        = "else";
public const string ELSE_IF     = "elif";
public const string WHILE       = "while";
public const string CONTINUE    = "continue";
public const string BREAK       = "break";
public const string SET         = "set";

Any function registered with the Parser must be implemented as a class derived from the ParserFunction class and must override its Evaluate method.

The first thing the Interpreter does when starting to work on a script is to simplify the script by removing all white spaces (unless they’re inside of a string), and all comments. Therefore, spaces or new lines can’t be used as operator separators. The operator separator character and the comment string are defined in Constants.cs, as well:

public const char END_STATEMENT = ';';
public const string COMMENT     = "//";

Variables and Arrays

CSCS supports numbers (type double), strings or tuples (arrays of variables implemented as a C# list). Each element of a tuple can be either a string or a number, but not another tuple. Therefore, multidimensional arrays are not supported. To define a variable, the CSCS function “set” is used. The C# class SetVarFunction implements the functionality of setting a variable value, as shown in Figure 2.

Figure 2 Implementation of the Set Variable Function

class SetVarFunction : ParserFunction
{
  protected override Parser.Result Evaluate(string data, ref int from)
  {
    string varName = Utils.GetToken(data, ref from, Constants.NEXT_ARG_ARRAY);
    if (from >= data.Length)
    {
      throw new ArgumentException("Couldn't set variable before end of line");
    }
    Parser.Result varValue = Utils.GetItem(data, ref from);
    // Check if the variable to be set has the form of x(i),
    // meaning that this is an array element.
    int arrayIndex = Utils.ExtractArrayElement(ref varName);
    if (arrayIndex >= 0)
    {
      bool exists = ParserFunction.FunctionExists(varName);
      Parser.Result  currentValue = exists ?
            ParserFunction.GetFunction(varName).GetValue(data, ref from) :
            new Parser.Result();
      List<Parser.Result> tuple = currentValue.Tuple == null ?
                                  new List<Parser.Result>() :
                                  currentValue.Tuple;
      if (tuple.Count > arrayIndex)
      {
        tuple[arrayIndex] = varValue;
      }
      else
      {
        for (int i = tuple.Count; i < arrayIndex; i++)
        {
          tuple.Add(new Parser.Result(Double.NaN, string.Empty));
        }
        tuple.Add(varValue);
      }
      varValue = new Parser.Result(Double.NaN, null, tuple);
    }
    ParserFunction.AddFunction(varName, new GetVarFunction(varName, varValue));
    return new Parser.Result(Double.NaN, varName);
  }
}

Here are some examples of defining a variable in CSCS:

set(a, "2 + 3");  // a will be equal to the string "2 + 3"
set(b, 2 + 3);    // b will be equal to the number 5
set(c(2), "xyz"); // c will be initialized as a tuple of size 3 with c(0) = c(1) = ""

Note that there’s no special declaration of an array: just defining a variable with an index will initialize the array if it’s not already initialized, and add empty elements to it, if necessary. In the preceding example, the elements c(0) and c(1) were added, both initialized to empty strings. This eliminates, in my view, the unnecessary step that’s required in most scripting languages of declaring an array first.

All CSCS variables and arrays are created using CSCS functions (like set or append). They’re all defined with global scope and can be used later just by calling the variable name or a variable with an index. In C#, this is implemented in the GetVarFunction shown in Figure 3.

Figure 3 Implementation of the Get Variable Function

class GetVarFunction : ParserFunction
{
  internal GetVarFunction(Parser.Result value)
  {
    m_value = value;
  }
  protected override Parser.Result Evaluate(string data, ref int from)
  {
    // First check if this element is part of an array:
    if (from < data.Length && data[from - 1] == Constants.START_ARG)
    {
      // There is an index given - it may be for an element of the tuple.
      if (m_value.Tuple == null || m_value.Tuple.Count == 0)
      {
        throw new ArgumentException("No tuple exists for the index");
      }
      Parser.Result index = Utils.GetItem(data, ref from, true /* expectInt */);
      if (index.Value < 0 || index.Value >= m_value.Tuple.Count)
      {
        throw new ArgumentException("Incorrect index [" + index.Value +
          "] for tuple of size " + m_value.Tuple.Count);
      }
      return m_value.Tuple[(int)index.Value];
    }
    // This is the case for a simple variable, not an array:
    return m_value;
  }
  private Parser.Result m_value;
}

Only the set variable function must be registered with the Parser:

ParserFunction.AddFunction(Constants.SET, new SetVarFunction());

The get variable function is registered inside of the set variable function C# code (see the next-to-last statement in Figure 2):

ParserFunction.AddFunction(varName, new GetVarFunction(varName, varValue));

Some examples of getting variables in CSCS are:

append(a, "+ 5"); // a will be equal to the string "2 + 3 + 5"
set(b, b * 2);    // b will be equal to the number 10 (if it was 5 before)

Control Flow: If, Else If, Else

The If, Else If and Else control flow statements are implemented internally as Parser functions, as well. They are registered by the Parser just like any other function:

ParserFunction.AddFunction(Constants.IF, new IfStatement(this));

Only the IF keyword must be registered with the Parser. ELSE_IF and ELSE statements will be processed inside of the IfStatement implementation:

class IfStatement : ParserFunction
{
  protected override Parser.Result Evaluate(string data, ref int from)
  {
    m_interpreter.CurrentChar = from;
    Parser.Result result = m_interpreter.ProcessIf();
    return result;
  }
  private Interpreter m_interpreter;
}

The real implementation of the If statement is in the Interpreter class, as shown in Figure 4.

Figure 4 Implementation of the If Statement

internal Parser.Result ProcessIf()
{
  int startIfCondition = m_currentChar;
  Parser.Result result = null;
  Parser.Result arg1 = GetNextIfToken();
  string comparison  = Utils.GetComparison(m_data, ref m_currentChar);
  Parser.Result arg2 = GetNextIfToken();
  bool isTrue = EvalCondition(arg1, comparison, arg2);
  if (isTrue)
  {
    result = ProcessBlock();
    if (result is Continue || result is Break)
    {
      // Got here from the middle of the if-block. Skip it.
      m_currentChar = startIfCondition;
      SkipBlock();
    }
    SkipRestBlocks();
    return result;
  }
  // We are in Else. Skip everything in the If statement.
  SkipBlock();
  int endOfToken = m_currentChar;
  string nextToken = Utils.GetNextToken(m_data, ref endOfToken);
  if (ELSE_IF_LIST.Contains(nextToken))
  {
    m_currentChar = endOfToken + 1;
    result = ProcessIf();
  }
  else if (ELSE_LIST.Contains(nextToken))
  {
    m_currentChar = endOfToken + 1;
    result = ProcessBlock();
  }
  return result != null ? result : new Parser.Result();
}

It’s explicitly stated that the If condition has the form: argument 1, comparison sign, argument 2:

Parser.Result arg1 = GetNextIfToken();
string comparison  = Utils.GetComparison(m_data, ref m_currentChar);
Parser.Result arg2 = GetNextIfToken();
bool isTrue = EvalCondition(arg1, comparison, arg2);

This is where optional AND, OR or NOT statements can be added.

The EvalCondition function just compares the tokens according to the comparison sign:

internal bool EvalCondition(Parser.Result arg1, string comparison, Parser.Result arg2)
{
  bool compare = arg1.String != null ? CompareStrings(arg1.String, comparison, arg2.String) :
                                       CompareNumbers(arg1.Value, comparison, arg2.Value);
  return compare;
}

Here’s the implementation of a numerical comparison:

internal bool CompareNumbers(double num1, string comparison, double num2)
{
  switch (comparison) {
    case "==": return num1 == num2;
    case "<>": return num1 != num2;
    case "<=": return num1 <= num2;
    case ">=": return num1 >= num2;
    case "<" : return num1 <  num2;
    case ">" : return num1 >  num2;
    default: throw new ArgumentException("Unknown comparison: " + comparison);
  }
}

The string comparison is similar and is available in the accompanying code download, as is the straightforward implementation of the GetNextIfToken function.

When an if, else if, or else condition is true, all of the statements inside the block are processed. This is implemented in Figure 5 in the ProcessBlock method. If the condition isn’t true, all the statements are skipped. This is implemented in the SkipBlock method (see accompanying source code).

Figure 5 Implementation of the ProcessBlock Method

internal Parser.Result ProcessBlock()
{
  int blockStart = m_currentChar;
  Parser.Result result = null;
  while(true)
  {
    int endGroupRead = Utils.GoToNextStatement(m_data, ref m_currentChar);
    if (endGroupRead > 0)
    {
      return result != null ? result : new Parser.Result();
    }
    if (m_currentChar >= m_data.Length)
    {
      throw new ArgumentException("Couldn't process block [" +
                                   m_data.Substring(blockStart) + "]");
    }
    result = Parser.LoadAndCalculate(m_data, ref m_currentChar,
      Constants.END_PARSE_ARRAY);
    if (result is Continue || result is Break)
    {
      return result;
    }
  }
}

Note how the “Continue” and “Break” statements are used inside of the while loop. These statements are implemented as functions, as well. Here’s Continue:

class Continue : Parser.Result  { }
class ContinueStatement : ParserFunction
{
  protected override Parser.Result
    Evaluate(string data, ref int from)
  {
    return new Continue();
  }
}

The implementation of the Break statement is analogous. They’re both registered with the Parser like any other function:

ParserFunction.AddFunction(Constants.CONTINUE,  new ContinueStatement());
ParserFunction.AddFunction(Constants.BREAK,     new BreakStatement());

You can use the Break function to get out of nested If blocks or to get out of a while loop.

Control Flow: The While Loop

The while loop is also implemented and registered with the Parser as a function:

ParserFunction.AddFunction(Constants.WHILE,     new WhileStatement(this));

Whenever the while keyword is parsed, the Evaluate method of the WhileStatement object is called:

class WhileStatement : ParserFunction
{
  protected override Parser.Result Evaluate(string data, ref int from)
  {
    string parsing = data.Substring(from);
    m_interpreter.CurrentChar = from;
    m_interpreter.ProcessWhile();
    return new Parser.Result();
  }
  private Interpreter m_interpreter;
}

So the real implementation of the while loop is in the Interpreter class, as shown in Figure 6.

Figure 6 Implementation of the While Loop

internal void ProcessWhile()
{
  int startWhileCondition = m_currentChar;
  // A heuristic check against an infinite loop.
  int cycles = 0;
  int START_CHECK_INF_LOOP = CHECK_AFTER_LOOPS / 2;
  Parser.Result argCache1 = null;
  Parser.Result argCache2 = null;
  bool stillValid = true;
  while (stillValid)
  {
    m_currentChar = startWhileCondition;
    Parser.Result arg1 = GetNextIfToken();
    string comparison = Utils.GetComparison(m_data, ref m_currentChar);
    Parser.Result arg2 = GetNextIfToken();
    stillValid = EvalCondition(arg1, comparison, arg2);
    int startSkipOnBreakChar = m_currentChar;
    if (!stillValid)
    {
      break;
    }
    // Check for an infinite loop if same values are compared.
    if (++cycles % START_CHECK_INF_LOOP == 0)
    {
      if (cycles >= MAX_LOOPS || (arg1.IsEqual(argCache1) &&
        arg2.IsEqual(argCache2)))
      {
        throw new ArgumentException("Looks like an infinite loop after " +
          cycles + " cycles.");
      }
      argCache1 = arg1;
      argCache2 = arg2;
    }
    Parser.Result result = ProcessBlock();
    if (result is Break)
    {
      m_currentChar = startSkipOnBreakChar;
      break;
    }
  }
  // The while condition is not true anymore: must skip the whole while
  // block before continuing with next statements.
  SkipBlock();
}

Note that the while loop proactively checks for an infinite loop after a certain number of iterations, defined in the configuration settings by the CHECK_AFTER_LOOPS constant. The heuristic is that if the exact same values in the while condition are compared over several loops, this could indicate an infinite loop. Figure 7 shows a while loop where I forgot to increment the cycle variable i inside of the while loop.

Detecting an Infinite While Loop in CSCS
Figure 7 Detecting an Infinite While Loop in CSCS

Functions, Functions, Functions

In order for CSCS to do more useful things, more flesh needs to be added; that is, more functions must be implemented. Adding a new function to CSCS is straightforward: First implement a class deriving from the ParserFunction class (overriding the Evaluate method) and then register it with the Parser. Here’s the implementation of the Print function:

class PrintFunction : ParserFunction
{
  protected override Parser.Result Evaluate(string data, ref int from)
  {
    List<string> args = Utils.GetFunctionArgs(data, ref from);
    m_interpreter.AppendOutput(string.Join("", args.ToArray()));
    return new Parser.Result();
  }
  private Interpreter m_interpreter;
}

The function prints any number of comma-separated arguments passed to it. The actual reading of the arguments is done in the GetFunctionArgs auxiliary function, which returns all the passed arguments as a list of strings. You can take a look at the function in the accompanying source code.

The second and last step is to register the Print function with the Parser in the program initialization part:

ParserFunction.AddFunction(Constants.PRINT,     new PrintFunction(this));

The Constants.PRINT constant is defined as “print.”

Figure 8 shows an implementation of a function that starts a new process.

Figure 8 Run Process Function Implementation

class RunFunction : ParserFunction
{
  internal RunFunction(Interpreter interpreter)
  {
    m_interpreter = interpreter;
  }
  protected override Parser.Result Evaluate(string data, ref int from)
  {
    string processName = Utils.GetItem(data, ref from).String;
    if (string.IsNullOrWhiteSpace(processName))
    {
      throw new ArgumentException("Couldn't extract process name");
    }
    List<string> args = Utils.GetFunctionArgs(data, ref from);
    int processId = -1;
    try
    {
      Process pr = Process.Start(processName, string.Join("", args.ToArray()));
      processId = pr.Id;
    }
    catch (System.ComponentModel.Win32Exception exc)
    {
      throw new ArgumentException("Couldn't start [" + processName + "]:
        " + exc.Message);
    }
    m_interpreter.AppendOutput("Process " + processName + " started, id:
      " + processId);
    return new Parser.Result(processId);
  }
  private Interpreter m_interpreter;
}

Here’s how you can find files, start and kill a process, and print some values in CSCS:

 

set(b, findfiles("*.cpp", "*.cs"));
set(i, 0);
while(i < size(b)) {
  print("File ", i, ": ", b(i));
  set(id, run("notepad", b(i)));
  kill(id);
  set(i, i+ 1);
}

Figure 9 lists the functions that are implemented in the downloadable source code, along with a brief description. Most of the functions are wrappers over corresponding C# functions.

Figure 9 CSCS Functions

abs gets the absolute value of an expression
append appends a string or a number (converted then to a string) to a string
cd changes a directory
cd.. changes directory one level up
dir shows the contents of the current directory
enc gets the contents of an environment variable
exp exponential function
findfiles finds files with a given pattern
findstr finds files containing a string having a specific pattern
indexof returns an index of a substring, or -1, if not found
kill kills a process having a given process id number
pi returns an approximation of the pi constant
pow returns the first argument to the power of the second argument
print prints a given list of arguments (numbers and lists are converted to strings)
psinfo returns process information for a given process name
pstime returns total processor time for this process; useful for measuring times
pwd displays current directory pathname
run starts a process with a given argument list and returns process id
setenv sets the value of an environment variable
set sets the value of a variable or of an array element
sin returns the value of the sine of the given argument
size returns the length of the string or the size of the list
sqrt returns the square root of the given number
substr returns the substring of the string starting from given index
tolower converts a string to lowercase
toupper converts a string to uppercase

Internationalization

Note that you can register multiple labels (function names) corresponding to the same function with the Parser. In this way, it’s possible to add any number of other languages.

Adding a translation consists of registering another string with the same C# object. The corresponding C# code follows:

var languagesSection =
  ConfigurationManager.GetSection("Languages") as NameValueCollection;
string languages = languagesSection["languages"];
foreach(string language in languages.Split(",".ToCharArray());)
{
  var languageSection =
    ConfigurationManager.GetSection(language) as NameValueCollection;
  AddTranslation(languageSection, Constants.IF);
  AddTranslation(languageSection, Constants.WHILE);
...
}

The AddTranslation method adds a synonym for an already existing function:

public void AddTranslation(NameValueCollection languageDictionary, string originalName)
{
  string translation = languageDictionary[originalName];
  ParserFunction originalFunction =
    ParserFunction.GetFunction(originalName);
  ParserFunction.AddFunction(translation, originalFunction);
}

Thanks to C# support of Unicode, most languages can be added this way. Note that the variable names can be in Unicode, as well.

All of the translations are specified in the configuration file. This is how the configuration file looks for Spanish:

<Languages>
  <add key="languages" value="Spanish" />
</Languages>
<Spanish>
    <add key="if"    value ="si" />
    <add key="else"  value ="sino" />
    <add key="elif"  value ="sinosi" />
    <add key="while" value ="mientras" />
    <add key="set"   value ="asignar" />
    <add key="print" value ="imprimir" />
 ...
</Spanish>

Here’s an example of the CSCS code in Spanish:

asignar(a, 5);
mientras(a > 0) {
  asignar(expr, 2*(10 – a*3));
  si (expr > 0) {
    imprimir(expr, " es mayor que cero");
  }
  sino {
    imprimir(expr, " es cero o menor que cero");
  }
  asignar(a, a - 1);
}

Note that the Parser can now process control statements and functions in both English and Spanish. There’s no limit to the number of languages you can add.

Wrapping Up

All of the CSCS elements—control flow statements, variables, arrays, and functions—are implemented by defining a C# class deriving from the ParserFunction base class and overriding its Evaluate method. Then you register an object of this class with the Parser. This approach provides the following advantages:

  • Modularity: Each CSCS function and control flow statement resides in its own class, so it’s easy to define a new function or a control flow statement or to modify an existing one.
  • Flexibility: It’s possible to have CSCS keywords and function names in any language. Only the configuration file needs to be modified. Unlike most other languages, in CSCS control flow statements functions and variable names don’t have to be in ASCII characters.

Of course, at this stage the CSCS language is far from complete. Here are some ways to make it more useful:

  • Creating multidimensional arrays. The same C# data structure as the one for one-dimensional arrays, List<Result>, can be used. However, more parsing functionality must be added when getting and setting an element of the multidimensional array.
  • Enabling tuples to be initialized on one line.
  • Adding logical operators (AND, OR, NOT and so forth), which would be very useful for if and while statements.
  • Adding the capability to write functions and methods in CSCS. Currently, only functions previously written and compiled in C# can be used.
  • Adding the capability to include CSCS source code from other units.
  • Adding more functions that perform typical OS-related tasks. Because most such tasks can be easily implemented in C#, most would be just a thin wrapper over their C# counterparts.
  • Creating a shortcut for the set(a, b) function as “a = b.”

I hope you’ve enjoyed this glimpse of the CSCS language and seeing how you can create your own custom scripting language.


Vassili Kaplan is a former Microsoft Lync developer. He’s passionate about programming in C# and C++.  He currently lives in Zurich, Switzerland, and works as a freelancer for various banks. You can reach him at iLanguage.ch.

Thanks to the following Microsoft technical expert for reviewing this article: James McCaffrey