Everyone assumes that a scripting language is an interpreted language, i.e., its instructions are interpreted by the runtime one by one, rather than being compiled to the machine-executable code all at once. The main advantage of the pre-compilation is performance. There's also a drawback: some flexibility will be lost. For instance, if your script's nine statements are correct and the tenth is wrong, at least the first nine will be executed when being interpreted. But when you compile these 10 statements, all 10 will be either compiled and then executed or none of them will.

In this article, I'm going to discuss how you can improve performance of a scripting language. The article is directed to mostly custom scripting languages but could also be applied to the industry-standard scripting languages. I'm going to use CSCS (Customized Scripting in C#) as a sample scripting language to be compiled. I've talked about this language in previous CODE Magazine articles: https://www.codemag.com/article/1607081 introduced it, https://www.codemag.com/article/1711081 showed how you can use it on top of Xamarin to create cross-platform native mobile apps, and https://www.codemag.com/article/1903081 showed how you can use it for Unity programming.

To simplify things, I'm going to pre-compile not the whole script, but a function containing script. Ultimately, the whole script can be split into different functions.

To precompile a function, I'll use the following strategy: I'll translate the function scripting code into C# code, then compile it into a C# assembly, and add it to the executing binary at runtime. Then, as soon as there's a request to run the compiled code, I'll bind the run-time function arguments with the pre-complied function arguments and execute the compiled code.

Strategy: Translate the function-scripting code into C# code, compile it into a C# assembly, then add it to the executing binary at runtime.

To accomplish this task, let's use Microsoft.CSharp and System.CodeDom.Compiler namespaces. These are the standard namespaces that come with every .NET distribution, with the exception of Xamarin mobile development for iOS and Android – unfortunately you can't use the techniques explained in this article for mobile development (this restriction is imposed by the iOS and Android architectures). CSCS can still be used for cross-platform mobile development, but without pre-compilation.

That's it! Sounds easy? Well, one part is not necessarily straightforward: translating the scripting code into the C# code. It depends on the scripting language – if it's your own language, chances are that it doesn't have as many options and functions as Python and you should be able to do the translation.

There's an advantage of taking CSCS as a sample language, because CSCS is implemented in C#. Still, there are some quirks because the syntax isn't the same and the variable types aren't declared explicitly in CSCS but are deduced from the context. I hope you can use the techniques explained in this article for other scripting languages as well.

Compiling a “Hello, World!” Script

Let's start with a relatively simple example that should tell you where you're heading. Consider the following CSCS function that you'll compile:

cfunction helloCompiled(name, int n)
{
    for (i = 1; i <= n; i++) 
    {
        printc("Hello, " + name + "! 2^" + i + " = " + pow(2, i));
    }
}

Listing 1 shows the resulting C# code after translating the code above to C#. The main function body of the resulting C# code is the following:

Variable __varTempVar = null;
var i = 1;
for (i = 1; i <= __varNum[0]; i++) 
{
    ParserFunction.AddGlobalOrLocalVariable("i", new GetVarFunction(Variable.ConvertToVariable(i)));
    Console.WriteLine("Hello, " + __varStr[0] + "! 2^" + i + " = " + Pow(2, i));
}
__varTempVar = Variable.EmptyInstance;
return __varTempVar;

Listing 1 contains much more C# stuff. Below, you'll see why it's needed and what it does.

Listing 1: C# Code Generated from the CSCS Function helloCompiled

using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Globalization;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using static System.Math;

namespace SplitAndMerge
{
    public partial class Precompiler
    {
        public static Variable helloCompiled(List<string> __varStr,
                                             List<double> __varNum,
                                             List<List<string>> __varArrStr,
                                             List<List<double>> __varArrNum,
                                             List<Dictionary<string, string>> __varMapStr,
                                             List<Dictionary<string, double>> __varMapNum,
                                             List<Variable> __varVar)    
        {
            string __argsTempStr = "";
            string __actionTempVar = "";
            ParsingScript __scriptTempVar = null;
            ParserFunction __funcTempVar = null;
            Variable __varTempVar = null;
            
            var i = 1;
            for (i = 1; i <= __varNum[0]; i++)
            {
                ParserFunction.AddGlobalOrLocalVariable("i", new GetVarFunction(Variable.ConvertToVariable(i)));
                Console.WriteLine("Hello, " + __varStr[0] + "! 2^" + i + " = " + Pow(2, i));
            }
            __varTempVar = Variable.EmptyInstance;
            return __varTempVar;
        }
    }
}

The main difference between a normal CSCS function and a CSCS function that's intended to be compiled is the header. It's the functionc keyword that tells the CSCS Parsing runtime that the function is intended to be compiled. The arguments may contain the types (CSCS function arguments never have types because they are deduced at runtime). But in C#, all types of the arguments must be supplied at compile time. When an argument isn't supplied (like “name” in the helloCompiled code snippet above), it's considered to be a string.

Note that it's not necessary to specify the function return type. This is because when translated to C#, the resulting C# function always returns a Variable object. This Variable object is a static Variable.EmptyInstance in case nothing needs to be returned, as in our example above. In other words, Variable.EmptyInstance imitates a void function. Otherwise, the Variable object being returned holds the return value.

Here's the created C# function signature (this signature will be the same for all of the C# functions compiled from CSCS):

public static Variable helloCompiled(
   List<string>                     __varStr,
   List<double>                     __varNum,
   List<List<string>>               __varArrStr,
   List<List<double>>               __varArrNum,
   List<Dictionary<string, string>  __varMapStr,
   List<Dictionary<string, double>> __varMapNum,
   List<Variable>                   __varVar) {

The arguments for all of the compiled C# functions are always the same. This is because all of the string arguments in the CSCS cfunction definition are a part of the C# List of strings __varStr, all of the numeric arguments are a part of the C# List of doubles __varNum, all of the arrays of strings are inside of the List<List<string>> __varArrStr, and so on.

This is the explanation of why the CSCS string variable name was replaced with the C# variable __varStr[0] and the CSCS integer variable n was replaced with __varNum[0] in the resulting C# function in Listing 1.

Because you're using lists in the function signature, you can have an unlimited number of function arguments of different types.

You can have an unlimited number of function arguments of different types.

To run the compiled function, the CSCS call is the same as it would've been when running a non-compiled version:

helloCompiled("World", 5);

The results of running the helloCompiled script defined above are the following:

Hello, World! 2^1 = 2
Hello, World! 2^2 = 4
Hello, World! 2^3 = 8
Hello, World! 2^4 = 16
Hello, World! 2^5 = 32

One question might arise: How come C# recognizes that pow(2, i) is in the Math namespace and it's equivalent to Math.Pow(2, i)? You can check in Listing 1 for the presence of the following header line:

using static System.Math;

This allows writing all of the Math functions without specifying the namespace.

You just need to uppercase the first letter and lowercase the rest. That's how pow(2, i) got converted to Pow(2, i). Note that all of the functions in the Math namespace have a first upper-case letter and the rest are lowercase. The only exception to this rule is the Math.PI constant, so you deal with it explicitly. See the definition of the IsMathFunction() in the next section's code snippet. This function also checks to see if the passed parameter is a math function or not.

I'll be looking into how to compile CSCS functions in the next section.

Compiling C# Code at Runtime

To compile the C# code at runtime, you're going to use Microsoft.CSharp and System.CodeDom.Compiler namespaces. Listing 2 contains Precompiler variable definitions that are going to be used at the compilation stage.

Listing 2: Main Definitions of the Precompiler Class

using System.Collections.Generic;
using System.Text;
using System.Reflection;
using System.Linq.Expressions;
using System.CodeDom.Compiler;
using Microsoft.CSharp;

namespace SplitAndMerge
{
    public class Precompiler
    {
        string m_cscsCode;
        public string CSharpCode { get; private set; }
        string[] m_actualArgs;
        StringBuilder m_converted = new StringBuilder();
        Dictionary<string, Variable> m_argsMap;
        Dictionary<string, string> m_;
        HashSet<string> m_newVariables = new HashSet<string>();
        string m_currentStatement;
        string m_nextStatement;
        string m_depth;
        bool m_knownExpression;
        
        Func<List<string>, List<double>, List<List<string>>, List<List<double>>, List<Dictionary<string, string>>, List<Dictionary<string, double>>, List<Variable>, Variable> m_compiledFunc;
        
        static List<string> s_namespaces = new List<string>();
        Dictionary<string, string> m_paramMap = new Dictionary<string, string>();
        
        static string NUMERIC_VAR_ARG = "__varNum";
        static string STRING_VAR_ARG = "__varStr";
        static string NUMERIC_ARRAY_ARG = "__varArrNum";
        static string STRING_ARRAY_ARG = "__varArrStr";
        static string NUMERIC_MAP_ARG = "__varMapNum";
        static string STRING_MAP_ARG = "__varMapStr";
        static string CSCS_VAR_ARG = "__varVar";
        
        static string ARGS_TEMP_VAR = "__argsTempStr";
        static string SCRIPT_TEMP_VAR = "__scriptTempVar";
        static string PARSER_TEMP_VAR = "__funcTempVar";
        static string ACTION_TEMP_VAR = "__actionTempVar";
        static string VARIABLE_TEMP_VAR = "__varTempVar";

The main result of the compilation is the following function delegate:

Func<List<string>,
     List<double>,
     List<List<string>>,
     List<List<double>>,
     List<Dictionary<string, string>>, List<Dictionary<string, double>>,
     List<Variable>,
     Variable> m_copiledFunc;

This function is going to be used when running a pre-compiled function at runtime. The first six arguments are the arguments of the C# method that you're going to create (see the definition of the helloCompiled() function in the previous section). The last argument, Variable, is the return value of the C# method.

Listing 3 contains the compilation code. Let's briefly discuss it.

Listing 3: Main Method to Compile C# Code

public void Compile()
{
    var CompilerParams = new CompilerParameters();
    
    CompilerParams.GenerateInMemory = true;
    CompilerParams.TreatWarningsAsErrors = false;
    CompilerParams.GenerateExecutable = false;
    CompilerParams.CompilerOptions = "/optimize";
    
    Assembly[] assemblies = AppDomain.CurrentDomain.GetAssemblies();
    foreach (Assembly asm in assemblies) 
    {
        AssemblyName asmName = asm.GetName();
        if (asmName == null || string.IsNullOrWhiteSpace(asmName.CodeBase)) 
        {
            continue;
        }
        
        var uri = new Uri(asmName.CodeBase);
        if (uri != null && File.Exists(uri.LocalPath)) 
        {
            CompilerParams.ReferencedAssemblies.Add(uri.LocalPath);
        }
    }
    
    CSharpCode = ConvertScript();
    
    var provider = new CSharpCodeProvider();
    var compile = provider.CompileAssemblyFromSource(CompilerParams, CSharpCode);
    if (compile.Errors.HasErrors) 
    {
        string text = "Compile error: ";
        foreach (var ce in compile.Errors) 
        {
            text += ce.ToString() + " -- ";
        }
        throw new ArgumentException(text);
    }

    m_compiledFunc = CompileAndCache(compile, m_functionName);
}

First, the code creates the System.CodeDom.Compiler.CompilerParams object that's going to be used at the compilation stage. Note that you add all of the assemblies, referenced in the currently running assembly, to the assembly being compiled. This way, you can use all of your C# classes in the compiled functions. This is done by collecting these assemblies as follows:

Assembly[] assemblies = AppDomain.CurrentDomain.GetAssemblies();

We add all of the assemblies, referenced in the currently running assembly, to the assembly being compiled.

Then you get the actual C# code to compile in the ConvertScript() method – I'll talk about how to covert CSCS script to the C# code in the next section.

The actual compilation and the creation of the new assembly takes place in the Microsoft.CSharp.CSharpCodeProvider.CompileAssemblyFromSource() method.

After the code has been compiled, you need to be able to use it at some later point in time. This is done in the CompileAndCache() method in Listing 4. The CompileAndCache() method creates a System.Linq.Expressions.Expression object and binds to it the method input parameters. This object name is m_compiledFunc (see its definition in Listing 2). It will be used later on to invoke the compiled method at runtime.

Listing 4: An Auxiliary Function to Compile and Cache the Results

static Func<List<string>, List <double>, List<List<string>>, List<List<double>>, 
            List<Dictionary<string, string>>, List<Dictionary<string, double>>, 
            List<Variable>, Variable> CompileAndCache(CompilerResults compile, string functionName)
{
    Module module = compile.CompiledAssembly.GetModules()[0];
    Type mt = module.GetType("SplitAndMerge.Precompiler");
    
    var paramTypes = new List<ParameterExpression>();
    paramTypes.Add(Expression.Parameter(typeof(List <string>), STRING_VAR_ARG));
    paramTypes.Add(Expression.Parameter(typeof(List <double>), NUMERIC_VAR_ARG));
    paramTypes.Add(Expression.Parameter(typeof(List<List<string>>), STRING_ARRAY_ARG));
    paramTypes.Add(Expression.Parameter(typeof(List<List<double>>), NUMERIC_ARRAY_ARG));
    paramTypes.Add(Expression.Parameter(typeof(List<Dictionary<string, string>>), STRING_MAP_ARG));
    paramTypes.Add(Expression.Parameter(typeof(List<Dictionary<string, double>>), NUMERIC_MAP_ARG));
    paramTypes.Add(Expression.Parameter(typeof(List <Variable>), CSCS_VAR_ARG));
    
    List<Type> argTypes = new List<Type>();
    for (int i = 0; i < paramTypes.Count; i++) 
    {
        argTypes.Add(paramTypes[i].Type);
    }
    
    MethodInfo methodInfo =  mt.GetMethod(functionName, argTypes.ToArray());
    MethodCallExpression methodCall = Expression.Call(methodInfo, paramTypes);
    
    var lambda = Expression.Lambda<Func<List<string>,
                                   List<double>, List<List<string>>,
                                   List<List<double>>, List<Dictionary<string, string>>,
                                   List<Dictionary<string, double>>, List<Variable>,
                                   Variable>>(methodCall, paramTypes.ToArray());
    var func = lambda.Compile();
    
    return func;
}

Converting Scripting Code to C#

Probably the most complicated step is converting the scripting code to the C# code. In this section, you'll see how it's done for CSCS scripting. The strategy is to start converting something small and then gradually extend the conversion.

You start with a small trivial project, and you should never expect it to get large. If you do, you'll just overdesign. –Linus Torvalds

Listing 5 shows the main conversion method, ConvertScript(). First, it splits the script into statements (the separation tokens being ";", "{", and “}” characters) and then converts the statements one by one, looking ahead into the next statement.

Listing 5: Main Method to Convert CSCS script to C#

string ConvertScript()
{
    m_converted.Clear();
    int numIndex = 0;
    int strIndex = 0;
    int arrNumIndex = 0;
    int arrStrIndex = 0;
    int mapNumIndex = 0;
    int mapStrIndex = 0;
    int varIndex = 0;
    
    // Mapping from the original arg to the element array it is in
    for (int i = 0; i < m_actualArgs.Length; i++) 
    {
        Variable typeVar = m_argsMap[m_actualArgs[i]];
        m_paramMap[m_actualArgs[i]] = typeVar.Type == Variable.VarType.STRING ? STRING_VAR_ARG + "[" + (strIndex++) + "]" : 
            typeVar.Type == Variable.VarType.NUMBER ? NUMERIC_VAR_ARG + "[" + (numIndex++) + "]" :
            typeVar.Type == Variable.VarType.ARRAY_STR ? STRING_ARRAY_ARG+ "[" + (arrStrIndex++) + "]" :
            typeVar.Type == Variable.VarType.ARRAY_NUM ? NUMERIC_ARRAY_ARG+ "[" + (arrNumIndex++) + "]" :
            typeVar.Type == Variable.VarType.MAP_STR ? STRING_MAP_ARG + "[" + (mapStrIndex++) + "]" :
            typeVar.Type == Variable.VarType.MAP_NUM ? NUMERIC_MAP_ARG + "[" + (mapNumIndex++) + "]" :
            typeVar.Type == Variable.VarType.VARIABLE ? CSCS_VAR_ARG + "[" + (varIndex++) + "]" : "";
    }
    
    m_converted.AppendLine("using System; 
                            using System.Collections;
                            using System.Collections.Generic;
                            using System.Collections.Specialized;using System.Globalization;
                            using System.Linq; 
                            using System.Linq.Expressions;
                            using System.Reflection;
                            using System.Text;
                            using System.Threading;
                            using System.Threading.Tasks;
                            using static System.Math;
                            ");
                            
    for (int i = 0; i < s_namespaces.Count; i++) 
    {
        m_converted.AppendLine(s_namespaces[i]);
    }
    
    m_converted.AppendLine("namespace SplitAndMerge {\n" + "  public partial class Precompiler {");
    m_converted.AppendLine("    public static Variable " + m_functionName);
    m_converted.AppendLine("(List<string> " + STRING_VAR_ARG +",\n"+
                           " List<double> " + NUMERIC_VAR_ARG + ",\n"+
                           " List<List<string>> " + STRING_ARRAY_ARG + ",\n" +
                           " List<List<double>> " + NUMERIC_ARRAY_ARG + ",\n" +
                           " List<Dictionary<string, string>> " + STRING_MAP_ARG + ",\n" +
                           " List<Dictionary<string, double>> " + NUMERIC_MAP_ARG + ",\n"+
                           " List<Variable> " + CSCS_VAR_ARG + ") {\n");
    m_depth = "      ";
    
    m_converted.AppendLine(" string " + ARGS_TEMP_VAR + "= \"\";");
    m_converted.AppendLine(" string " + ACTION_TEMP_VAR+" = \"\";");
    m_converted.AppendLine(" ParsingScript " +SCRIPT_TEMP_VAR + " = null;");
    m_converted.AppendLine(" ParserFunction " + PARSER_TEMP_VAR + " = null;");
    m_converted.AppendLine(" Variable "+VARIABLE_TEMP_VAR+" =null;");  
    m_newVariables.Add(ARGS_TEMP_VAR);
    m_newVariables.Add(ACTION_TEMP_VAR);
    m_newVariables.Add(SCRIPT_TEMP_VAR);
    m_newVariables.Add(PARSER_TEMP_VAR);
    m_newVariables.Add(VARIABLE_TEMP_VAR);
    
    m_cscsCode = Utils.ConvertToScript(m_originalCode, out _);
    RemoveIrrelevant(m_cscsCode);
    
    m_statements = TokenizeScript(m_cscsCode);
    m_statementId = 0;
    while (m_statementId < m_statements.Count) 
    {
        m_currentStatement = m_statements[m_statementId];
        m_nextStatement = m_statementId < m_statements.Count - 1 ? m_statements[m_statementId + 1] : "";
        string converted = ProcessStatement(m_currentStatement, m_nextStatement, true);
        if (!string.IsNullOrWhiteSpace(converted)) 
        {
            m_converted.Append(m_depth + converted);
        }
        m_statementId++;
    }
    
    if (!m_lastStatementReturn) 
    {
        m_converted.AppendLine(CreateReturnStatement("Variable.EmptyInstance"));
    }
    
    m_converted.AppendLine("\n    }\n    }\n}");
    return m_converted.ToString();
}

Note the following for-loop in the ConvertScript method:

for (int i = 0; i < s_namespaces.Count; i++)
{
    m_converted.AppendLine(s_namespaces[i]);
}

It allows adding any namespace to the Precompiler by using its AddNamespace() static method:

public static void AddNamespace(string ns)
{
    s_namespaces.Add(ns);
}

For example, this can be used as follows:

Precompiler.AddNamespace("using MyNamespace;");

Each statement is split into a list of tokens and each token is processed one by one (looking ahead to a few next tokens). I won't show you the full implementation here (it can be consulted in the accompanying source code or on GitHub), but I'm going to discuss some of the main points of the conversion.

When processing each statement token, there are different checks being made. For instance, you check to see if you override a particular CSCS token with a C# function. This is the case of the printc token shown in the “Hello, World” example in the first section, where it was replaced by the C# Console.WriteLine() statement. All of these token overrides happen in the GetCSharpFunction() method. Here's an implementation of this method with just one token, printc, overridden – this is the place where you can add as many overrides as you wish:

string GetCSharpFunction(string functionName, string arguments = "")
{
    if (functionName == "printc") 
    {
        arguments = ReplaceArgsInString(arguments.Replace("\\\"", "\""));
        return "Console.WriteLine(" + arguments + ");";
    }
    
    return "";
}

A special case is parsing a for-loop. Consider the statement for (i = 1; i <= n; i++) of the helloCompiled function from the first section. You don't know if the variable i was defined before the for-loop or not. It doesn't matter for CSCS because you don't define variables before they're used and their type is always deduced from the expression. Because this does matter in C#, you check whether the variable i has been defined in this method before the for-loop, and if not, you prepend the var i = 1; statement before the for-loop (see Listing 1).

Let's now see how you can figure out if a particular token is a mathematical function or not. Note that the CSCS language is case-insensitive but most of the mathematical functions have the first letter in uppercase and the rest in lower case, with the exception of the Math.PI constant. This is how we deal with this case:

public static bool IsMathFunction(string name, out string corrected)
{
    corrected = name;
    string candidate = name[0].ToString().ToUpperInvariant() + name.Substring(1).ToLower();
    if (candidate == "Pi")
    {
        corrected = "Math.PI";
        return true;
    }

Otherwise, you check if the passed token exists in the System.Math namespace:

Type mathType = typeof(System.Math);
try 
{
    MethodInfo myMethod = mathType.GetMethod(candidate);
    if (myMethod != null) 
    {
        corrected = candidate;
        return true;
    }
    return false;
}
catch (AmbiguousMatchException) 
{
    corrected = candidate;
    return true;
}

One of the most important methods for converting the CSCS script to the C# code is the ResolveToken() method shown in Listing 6. What happens if the ResolveToken() method doesn't resolve the token (i.e., the value of the resolved variable will be false after calling this method)? This will happen when the passed token:

Is not a string or a number
Is not one of the function arguments (they are all keys of the m_paramMap dictionary)
Is not a mathematical function from System.Math
Is not a special case of a C# function defined in the GetCSharpFunction() method
Is not one of the variables that have been already defined in this method (they are all part of the m_newVariables list)

Listing 6: Implementation of the ResolveToken Method

string ResolveToken(string token, out bool resolved, string arguments = "")
{ 
    resolved = true;
    if (IsString(token) || IsNumber(token)) 
    {
        return token;
    }
    
    string replacement;
    if (IsMathFunction(token, out replacement)) 
    {
        return replacement;
    }
    
    replacement = GetCSharpFunction(token, arguments);
    if (!string.IsNullOrEmpty(replacement)) 
    {
        return replacement;
    }
    
    if (ProcessArray(token, ref replacement)) 
    {
        return replacement;
    }
    
    string arrayName, arrayArg;
    if (IsArrayElement(token, out arrayName, out arrayArg)) 
    {
        token = arrayName;
    }
    
    if (m_paramMap.TryGetValue(token, out replacement)) 
    {
        return replacement + arrayArg;
    }
    
    resolved = !string.IsNullOrWhiteSpace(arrayArg) || m_newVariables.Contains(token);
    return token + arrayArg;
}

If a token can't be resolved, you do with it what you'd have done with any CSCS token – think of it as if it were a CSCS function or a variable, resolving it as if it were a part of the CSCS script. This part is done in the GetCSCSFunction() method:

string GetCSCSFunction(string argsStr, string functionName, char ch = '(') 
{
    StringBuilder sb = new StringBuilder();
    sb.AppendLine(m_depth + ARGS_TEMP_VAR + " =\"" + argsStr + "\";");
    sb.AppendLine(m_depth + SCRIPT_TEMP_VAR + " = new ParsingScript("+ARGS_TEMP_VAR+");");
    sb.AppendLine(m_depth + PARSER_TEMP_VAR + " = new ParserFunction("+SCRIPT_TEMP_VAR+", \"" + functionName + "\", '" + ch + "', ref " + ACTION_TEMP_VAR + ");");
    sb.AppendLine(m_depth + VARIABLE_TEMP_VAR + " = "+ PARSER_TEMP_VAR+".GetValue("+SCRIPT_TEMP_VAR+");");
    return sb.ToString();
}

Basically, what the above method does is, among other things, to print the following lines of the Split-and-Merge parsing algorithm, which is the base for CSCS parsing (see https://msdn.microsoft.com/en-us/magazine/mt573716.aspx):

__funcTempVar = new ParserFunction(__scriptTempVar, functionName, ch, ref __actionTempVar);
__varTempVar = __funcTempVar.GetValue(__scriptTempVar);

The first statement gets the appropriate CSCS implementation function (an object derived from the ParserFunction class and previously registered with the Parser), and the second statement will eventually invoke the Evaluate() protected method on the object from the first statement.

Every time you can't resolve a token, you assume that it's a CSCS token and perform same steps that you would've performed when interpreting this token with the CSCS parser.

Let's see an example where you have a print token instead of the printc token in the “Hello, World!” script you saw in the first section, i.e., suppose that the CSCS script is the following:

cfunction helloCompiled2(name, int n)
{
    for (i = 1; i <= n; i++) 
    {
        print("Hello, " + name + "! 2^" + i + " = " + pow(2, i));
    }
}

A CSCS function intended to be pre-compiled looks very similar to a normal CSCS function – only the headers differ.

Then the resulting C# code inside of the for-loop will be different because the print token won't be resolved in the ResolveToken() method and therefore GetCSCSFunction() will be called. This function creates most of the code inside of the for-loop:

for (i = 1; i <= __varNum[0]; i++) 
{
    ParserFunction.AddGlobalOrLocalVariable("i", new GetVarFunction( Variable.ConvertToVariable(i)));
    __actionTempVar ="";
    __argsTempStr = "\"Hello, \"+name+\"! 2^\"+i+\" = \"+pow(2,i)";
    __scriptTempVar = new ParsingScript(__argsTempStr);
    __funcTempVar = new ParserFunction(__scriptTempVar,  "print", '(', ref __actionTempVar);
    __varTempVar = __funcTempVar.GetValue(__scriptTempVar);
}

You probably noticed that every time a variable changes its value in the CSCS script (either by an assignment “=” or by any other operator like "*=", "+=", etc.), a statement like the following is inserted into the C# code:

ParserFunction.AddGlobalOrLocalVariable("i", new GetVarFunction(Variable.ConvertToVariable(i)));

This registers the new variable value with the Parser runtime, so the Parser runtime knows about any changes done in the C# code. Without the statement above, the value of “i” would be updated in C# code but not in any CSCS function that might be called from the C# code. The ConvertToVariable() is just a convenient method that creates a variable as a wrapper of any type passed to this method (string, number, array, etc.).

Registering CSCS Functions for Compilation with Parser

To let the CSCS Parser runtime know that the token cfunction means “pre-compile a function,” you need to register the cfunction handler in the initialization phase as follows:

ParserFunction.RegisterFunction("cfunction", new CompiledFunctionCreator());

As usual, the Evaluate() method of the CompiledFunctionCreator() class will do the actual work of the CSCS script translation into C# and its consequent compilation:

protected override Variable Evaluate( arsingScript script)
{
    string funcName;
    Utils.GetCompiledArgs(script, out funcName);
    Dictionary<string, Variable> argsMap;
    var args = Utils.GetCompiledFunctionSignature(script, out argsMap);
    string body = Utils.GetBodyBetween(script, '{', '}');
    Precompiler precompiler = new Precompiler(funcName, args, argsMap, body, script);
    precompiler.Compile();
    var customFunc = new CustomCompiledFunction(funcName,  body, args, precompiler, argsMap, script);
    ParserFunction.RegisterFunction(funcName,customFunc);
    return new Variable(funcName);
}

The GetCompiledFunctionSignature() gets all of the function arguments and their types (if the types are provided – by default they're all strings). This function can be consulted in the accompanying source code download (see the link to GitHub in the sidebar). The mapping is between the variable name and a variable object. The variable's Type field shows the actual argument type.

At the end, the GetCompiledFunctionSignature() method creates a CustomCompiledFunction object and registers it with the Parser so that its Evaluate() method will be triggered as soon as the function name is encountered by the CSCS Parser runtime.

Running Compiled Functions at Runtime

At runtime, the CustomCompiledFunction's Evaluate() method, shown below, is triggered:

protected override Variable Evaluate(ParsingScript script)
{
    List<Variable> args = script.GetFunctionArgs();
    if (args.Count != m_args.Length) 
    {
        throw new ArgumentException("Function [" + m_name + "] arguments mismatch: " + m_args.Length + " declared, " + args.Count + " supplied");
    }
    Variable result = RunCompiled(args);
    return result;
}

The implementation of the RunCompiled() method is shown in Listing 7. In particular, it binds passed arguments to the compiled function arguments. Here is the implementation of the Precompiler.Run() method:

public Variable Run(List<string> argsStr, 
                    List<double> argsNum, 
                    List<List<string>> argsArrStr, 
                    List<List<double>> argsArrNum, 
                    List<Dictionary<string, string>> argsMapStr,
                    List<Dictionary<string, double>> argsMapNum,
                    List<Variable> argsVar,
                    bool throwExc = true)
{
    if (m_compiledFunc == null) 
    {
        // "Late binding"
        Compile();
    }
    
    Variable result = m_compiledFunc.Invoke(argsStr, argsNum, argsArrStr, argsArrNum, argsMapStr, argsMapNum, argsVar);
    return result;
}

As you can see, the function body is almost trivial, because all the work was done in the compiling and caching stages above.

Listing 7: Implementation of the Custom Function RunCompiled() Method

public Variable RunCompiled(List<Variable> args)
{
    RegisterArguments(args);
    var argsStr    = new List<string>();
    var argsNum    = new List<double>();
    var argsArrStr = new List<List<string>>();
    var argsArrNum = new List<List<double>>();
    var argsMapStr = new List<Dictionary<string, string>>();
    var argsMapNum = new List<Dictionary<string, double>>();
    var argsVar    = new List<Variable>();
    
    for (int i = 0; i < m_args.Length; i++) 
    {
        Variable typeVar = m_argsMap[m_args[i]];
        if (typeVar.Type == Variable.VarType.STRING) 
        {
            argsStr.Add(args[i].AsString());
        }
        else if (typeVar.Type == Variable.VarType.NUMBER) 
        {
            argsNum.Add(args[i].AsDouble());
        }
        else if (typeVar.Type == Variable.VarType.ARRAY_STR) 
        {
            var subArrayStr = new List<string>();
            var tuple = args[i].Tuple;
            for (int j = 0; j < tuple.Count; j++) 
            {
                subArrayStr.Add(tuple[j].AsString());
            }
            argsArrStr.Add(subArrayStr);
        }
        else if (typeVar.Type == Variable.VarType.ARRAY_NUM) 
        {
            var subArrayNum = new List<double>();
            var tuple = args[i].Tuple;
            for (int j = 0; j < tuple.Count; j++) 
            {
                subArrayNum.Add(tuple[j].AsDouble());
            }
            argsArrNum.Add(subArrayNum);
        }
        else if (typeVar.Type == Variable.VarType.MAP_STR) 
        {
            var subMapStr = new Dictionary<string, string>();
            var tuple = args[i].Tuple;
            var keys = args[i].GetKeys();
            for (int j = 0; j < tuple.Count; j++) 
            {
                subMapStr.Add(keys[j], tuple[j].AsString());
            }
            argsMapStr.Add(subMapStr);
        }
        else if (typeVar.Type == Variable.VarType.MAP_NUM) 
        {
            var subMapNum = new Dictionary<string, double>();
            var tuple = args[i].Tuple;
            var keys = args[i].GetKeys();
            for (int j = 0; j < tuple.Count; j++) 
            {
                subMapNum.Add(keys[j], tuple[j].AsDouble());
            }
            argsMapNum.Add(subMapNum);
        }
        else if (typeVar.Type == Variable.VarType.VARIABLE) 
        {
            argsVar.Add(args[i]);
        }
    }
    
    Variable result = m_precompiler.Run(argsStr, argsNum, argsArrStr, argsArrNum, argsMapStr, argsMapNum, argsVar, false);
    ParserFunction.PopLocalVariables();
    return result;
}

Performance Gains from Pre-compilation

In this section, you're going to see if it makes sense to pre-compile scripting functions from the performance point of view. Let me relieve you from the suspense – yes, it does!

Consider this CSCS function:

cfunction exprCompiled(int n)
{
    start = pstime;
    complexExpr = 0.0;
    for (i = 0; i < n; i++) {
        baseVar = exp(sin(i) + cos(i));
        complexExpr += pow(baseVar, pi) * 2;
    }
    end = pstime;
    print("Result (Compiled) =" + complexExpr + " Time: ", (end - start), " ms. Runs: " + n);
    return complexExpression;
}

This was the version that requires precompiling. A “normal” CSCS function looks very similar (only the header differs):

function exprNotCompiled(n)
{
    start = pstime;
    complexExpr = 0.0;
    for (i = 0; i < n; i++) {
        baseVar = exp(sin(i) + cos(i));
        complexExpr += pow(baseVar, pi) * 2;
    }
    end = pstime;
    print("Result (Not Compiled) =" + complexExpr+" Time: ", (end - start), " ms. Runs: " + n);
    return complexExpression;
}

The pstime is a CSCS function that returns the CPU time in milliseconds for the current process. This is how you're going to run the scripts above and measure execution time:

runs = 100;
exprCompiled(runs);
exprNotCompiled(runs);

This is a sample output when runs = 100:

Result (Compiled) = 3348.26807565568 
Time:  3  ms.
Runs: 100
Result (Not Compiled) = 3348.26807565568 
Time:  68  ms.
Runs: 100

When the for-loop is executed 100 times, the pre-compiled version runs about 20 times faster! I did some testing for different number of runs. The results are shown in Table 1.

Note that the running time increases faster by the non-compiled version than by the compiled one. I think the reason is that the C# optimized compiled version deals much better with loops internally than the straightforward way of executing a loop, statement by statement, as it's done in the scripting version. This means that there's still some work to do to make the interpreted version more efficient.

Wrapping Up

The main disadvantage of a scripting, or an interpreted, language, is that it's usually much slower than a compiled language. In this article, you saw how you can have the best of two worlds by converting a script to C# at runtime and then compiling the created C# code into a C# assembly.

Note that this technique makes sense only if you intend to use the compiled code more than a few times, otherwise the script conversion and compilation time should also be taken into account.

Also, you saw an example of a performance gain when precompiling a script with some mathematical calculations. The performance gains were between 20 and 100 times. The mathematical functions are when you see the most speed improvements. In general, it should be evaluated case-by-case if the script pre-compilation makes sense or not. For short scripts and, in most cases without big loops, it probably doesn't matter if a script runs for three or for 68 milliseconds. In other words, you should always remember the famous Knuth quote.

Premature optimization is the root of all evil. –Donald Knuth

One of the improvements might be saving the compiled assemblies to disk (this is done by setting the parameter value CompilerParams.GenerateInMemory = false; and setting the assembly name with the CompilerParams.OutputAssembly parameter or with the “/out” command-line option set in the CompilerParams.CompilerOptions property) and then loading all of the compiled assemblies at the start up time. See Listing 3 for details on setting different compiler options.

Note that the complete up-to-date pre-compiler code is available at the GitHub repository (see the links in the sidebar).

I'd be happy to hear from you about how you are pre-compiling your scripts and what performance gains you observe. Also, it would be interesting to hear if you can use the compilation explained here for any other scripting language.

References

GitHub CSCS Source Code: https://github.com/vassilych/cscs

VS Code CSCS Debugger Extension: https://marketplace.visualstudio.com/items?itemName=vassilik.cscs-debugger

Split-and-Merge Algorithm and CSCS Language Free E-book: https://www.syncfusion.com/ebooks/implementing-a-custom-language

How does some no strings, free advice on a new or existing project sound? Do you need free advice on migrating an existing application from an aging legacy platform to a modern cloud or Web application? CODE Consulting experts have experience in cloud, Web, desktop, mobile, microservices, and DevOps and are a great resource for your team! Contact us today to schedule your free Hour of CODE consulting call with our expert consultants (not a sales call!). For more information visit www.codemag.com/consulting or email us at info@codemag.com.

Table 1: Comparison of the Running Times for the Compiled and Not Compiled Functions

Runs (n)	Compiled Version, ms	Not Compiled Version, ms	Numerical Result
100	3	68	3348.26807565568
500	8	298	16718.7736536161
1000	14	577	33259.1836291118
5000	44	2888	166297.860117355
10000	83	5769	332591.313075893
50000	345	31428	1662530.99749815

Compiling Scripts to Get Compiled Language Performance

Published in:

Filed under: