New computer languages are rare and successful ones are rarer still, yet Microsoft decided to create a new language to go along with the .NET Developer Platform. Why weren't existing languages good enough?

Over the years the topic of a simpler language entered the discussions of the Microsoft Visual C++ development team. While C++ gives programmers a tremendous amount of power and control, this power often comes at a price. C++ is more complex than languages such as Visual Basic and Java. It may take several more lines of C++ code to perform an operation requiring a single line of code in languages like Visual Basic or Java. These discussions resulted in the search for a way to marry the productivity of Visual Basic with the syntax and power of C++.

After the release of Visual Studio 6.0, a number of factors led the Microsoft Developer Division to a big decision. A decision was made to build a multi-language Common Language Runtime. It was obvious that Visual Basic would be an important language targeting this runtime. It was less clear what to do with C++.

One of the options was to modify C++ to work with the new runtime. When we discussed the idea of a simpler C++ with some customers, they provided some interesting feedback. They were intrigued with the idea of a simpler and more productive language, but they didn't want us to mess with C++. Therefore, we decided to create two languages in the C family. The Managed Extensions to C++ would be a minimal change to the C++ language to support the Common Language Runtime while C# would be a clean-slate language targeted at programmers who know C++ but want a cleaner, more productive language.

A Tool that We Can Use

The vast majority of Microsoft software is written using C or C++, and when we were designing C# we knew that we had to create a language that would be attractive to those programmers.

Attractive is a hard concept to nail down, but two important facets of attractiveness are comfort and capability.

Comfort is all about familiarity. Because C# has a syntax that is similar to C++, it's easy for C++ programmers to read and understand C# code. C# also has a similar conceptual model and there are C++ analogs to many C# design principles. Here's a simple C# program:

public class Example
{
    public static void Main(string[] args)
    {
        foreach (string s in args)
        {
            System.Console.WriteLine(s);
        }
    }
}

This code is easily understood by any C++ programmer.

Capability is highly valued by C++ programmers and it's commonly stated that the language gives you enough rope to hang yourself. Java, on the other hand, is a more minimal language, which omits some features that might get programmers into trouble.

Philosophically, C# is somewhere between C++ and Java. The mere fact that C# runs on a managed runtime means that it can't have as much flexibility and control as C++ does. We didn't want to omit features that would be useful for professional programmers merely because they could be used incorrectly. Features such as user-defined value types and operator overloading are in this category.

This philosophy was guided by the .NET Frameworks team, who were writing their frameworks in C# at the same time we were building the language. They were able to provide us feedback on using C# to write a framework, and on using C# to write framework clients. This dual perspective was invaluable; writing the framework validated that C# could be used for high-end programming, and using the framework validated that it was simple and straightforward to write client code.

We also benefited from the insight of members of Microsoft Research, who provided insight into how language research had attacked certain problems.

And finally, because C# was designed at the same time as VB .NET, the Managed Extensions to C++ and other .NET languages, there was a considerable amount of collaboration between the language and runtime teams. A Common Language Subset was defined, which specified the features that languages can expect other languages to support. This enables interop between languages.

Building Components

The .NET Developer Platform (NDP) is inherently a component-based environment and since C# exists only to create programs for NDP, it's not surprising that it's component-oriented.

There is no generally accepted definition for what a component is but, from the C# perspective, components are self-contained entities that support object-oriented programming. From the runtime perspective, this means that I can package one or more components into a .NET assembly and deploy them as a self-contained unit.

From the programmer's perspective, it means that everything that it takes to create a component is contained within a C# source file. C# provides important capabilities that make creating and using components easy.

Properties and Indexers

Nearly all components have a set of attributes that they support. A textbox exposes the text that's in the textbox and the font for the text. A filename exposes the directory and the file extension.

When writing a component, programmers have had two choices for how to implement these attributes. The easiest choice is to make the attribute a public field that the user can access directly. In C#, it would look like this:

public class FileInfo
{
    public string filename;
    public string directory;
    public string extension;
    
    public FileInfo(string filename)
    {
        // break apart into directory,
        // short name, and extension
    }
}

That's simple to code and simple to use, but it has a few problems: the biggest of which is that it is inefficient because the class doesn't know when (or if) the user will want the directory. Therefore, it has to do the parsing every time.

The property design pattern was invented to solve this problem. The public fields are replaced with get and set accessor functions:

public class FileInfo
{
    public string getFilename()
    {
        return filename;
    }
    
    public void setFilename(string filenameNew)
    {
        filename = filenameNew;
    }
    
    public string getDirectory()
    {
        // parse directory out, and return it
    }
    
    public string getExtension()
    {
        // parse extension out, and return it
    }
    
    private string filename;
    
    public FileInfo(string filename)
    {
        this.filename = filename;
    }
}

That's much nicer for the implementer. Since the user calls a function to get the values, you can postpone the work to create a value until the user asks for it. You can also cause other actions to take place when the user sets a value, such as repainting a control when its label is updated.

Unfortunately, you've made things tougher for the user. In the original version, the user would write:

FileInfo info = new FileInfo(filename);
string extension = info.extension;

While in the property version, the user must write:

FileInfo info = new FileInfo(filename);
string extension = info.getExtension();

It's worse if you want to increment a value. The simple version:

counter.value++;

Becomes this:

counter.SetValue(counter.GetValue() + 1);

That's considerably tougher to read or write. Additionally, the fact that getFilename() and setFilename() are related may not be obvious in the documentation or when using IntelliSense-like features.

To keep the advantages of the property idiom and still allow the user to have a model that looks like fields, C# provides full language support for properties. Using properties, our class would be written as:

public class FileInfo
{
    public string Filename
    {
        get
        {
            return filename;
        }
        set 
        {
            filename = value;
        }
    }

    public string Directory
    {
        get 
        {
            // parse directory out, and return it
        }
    }

    public string Extension
    {
        get 
        {
            // parse extension out, and return it
        }
    }

    private string filename;

    public FileInfo(string filename)
    {
        this.filename = filename;
    }
}

The user now gets an attribute that can be used like a field, and the class author can write the class in an efficient manner.

The use of attributes generates a couple of concerns. One is whether programmers will confuse properties with fields and not realize that they're calling a method that could have a side effect. This rarely causes any real problems in practice because most .NET components don't expose fields and only expose properties, so programmers are used to properties and understand that they're executing code when they get or set a value. With properties, there's no difference between the Text field of a textbox control and the actual text displayed on the control because the property code ensures they are always in sync.

A second concern is one of efficiency. Properties take simple and fast field accesses and replace them with function calls. Luckily, it's fairly easy for a JIT compiler to recognize simple properties and generate code that is the equivalent of using a public field so there's no loss of efficiency in using properties.

Properties allow a user to use something that looks like a field. It's also useful for some components to be treated as if they are arrays. For example, it might be reasonable to treat an Employees class as if it is an array of Employee objects. C# supports indexers to provide this functionality, which are like properties with an additional index parameter:

class Employees
{
    Employee[] employees;

    public Employee this[int index]
    {
        get 
        {
            return employees[index];
        }
        set
        {
            employees[index] = value;
        }
    }
    
    // other stuff here...
}

The user can now write code like:

Employee current = employees[i];

Indexers can be overloaded to support indexing on multiple types (int and string, for example), and can also be multi-dimensional.

Delegates and Events

The .NET Frameworks support an event-based model for many operations. In this model, when a user presses a button on a form, a timer expires or a SQL connection goes down, user code can be executed.

To support such a model, there needs to be a way to hook up a function so that it can be called when an event occurs. In the C++ world, this would often be done with function pointers. In the .NET world, delegates are used to perform this function. Delegates are similar to function pointers in that they refer to a specific function, but have a few important advantages.

First, delegates are type-safe; they can only hook up to functions that have the proper signature. They also encapsulate not only the function to call but also the instance to use when making the call, which allows delegates to hook up to both static function and instance methods. Finally, delegates are multicast, which means that a single delegate can call multiple functions when invoked.

It's possible to build event-based systems using only delegates, by exposing them as public fields, but delegates by themselves don't provide any protection against user error. Instead of adding their delegate to an already-existing delegate, it's easy to replace the existing delegate.

To make the model more robust, events are layered on top of delegates. Events are somewhat like properties, in that they provide restricted access to an underlying field.

Here's an example of a class supporting an event:

using System;

public class EventTest
{
    public delegate void MyHandler(string s);

    public event MyHandler Test;

    public void TestEvent(string s)
    {
        if (Test != null)
            Test(s);
    }
}

The delegate defines the signature of the function, and the TestEvent() method is used to fire it. The following class hooks up to the event:

public class Test
{
    public static void Main()
    {
        EventTest et = new EventTest();

        et.Test += new EventTest.MyHandler(Function);

        et.TestEvent("Hello");
    }

    public static void Function(string s)
    {
        Console.WriteLine("Function: {0}", s);
    }
}

This code creates an instance of the EventTest class, creates a delegate that points to Function and then attaches it to the event using the += operator.

Attributes

When designing a complex system, you often need to pass declarative information to the runtime part of a system. A transactional system, for example, needs to know how a transaction should be applied to a specific object.

In most languages, this information must be part of the class definition. Typically, this is done by adding a function that returns the information to the class. The runtime component can then find this function and call it to get the information. This works, but has several disadvantages.

The first disadvantage is that the programmer has to clutter their class with a function that's merely there to return a static piece of information. A more important disadvantage is that there's no compile-time validation that the function is returning the proper information; if it returns a string rather than an integer, the error won't be found until the code is executed.

A final disadvantage is that this scheme only works well for information about a whole class. It's very difficult to pass information about a specific parameter on a method.

Attributes are the C# solution to this problem. An attribute is a piece of declarative information that's placed on a program element (including classes, methods, return values, events, etc.). When the program is compiled, the compiler validates that the attribute is correct in that usage and stores the attribute information in the metadata for that object. The runtime component can then use reflection to obtain the value of the attribute. The .NET serialization subsystem uses attributes to determine what it should do when serializing a class:

[Serializable]
class Employee
{
    string name;
    string address;
    
    [NotSerialized]
    ArrayList cachedPayroll;
}

The Serializable attribute tells the runtime that it's okay to serialize this class while the NotSerialized attribute tells it that it shouldn't serialize the cachedPayroll field.

Attributes provide the designer with a very flexible way of specifying and obtaining information. The .NET Frameworks are heavy users of attributes for things like transactioning, marking a method as a Web service or specifying the details of interop or XML serialization.

Attributes are also extensible. Attributes are merely classes that inherit from the System.Attribute class. The use of an attribute is roughly analogous to calling the constructor of the class. To mark classes as secure, you could write the following class:

class SecureAttribute: System.Attribute
{
    bool secure;
    
    public SecureAttribute(bool secure)
    {
        this.secure = secure
    }
    
    public bool Secure
    {
        get { return secure; }
    }
}

Apply the attribute to a class:

[Secure(true)]
class TaxRateClass
{
    ...
}

Then fetch it at runtime:

Type type = typeof(TaxRateClass);
Attribute[] atts = type.GetCustomAttributes(typeof(SecureAttribute), true);
foreach (SecureAttribute att in atts)
{
    if (att.Secure)
        // ... handle true case here
}

XML Doc Comments

Java provides a feature called JavaDoc, where programmers can add documentation to their code as they write it. C# programmers can do the same thing using XML Doc Comments:

/// <summary>
/// The distance between two points
/// </summary>
/// <param name="pt1">Point 1</param>
/// <param name="pt2">Point 2</param>
/// <returns>The distance</returns>
public static float Distance(MyPointF pt1, MyPointF pt2)
{
    MyPointF delta = pt1 - pt2;
    return (float) Math.Sqrt(delta.X * delta.X + delta.Y * delta.Y);
}

The comments are written as XML and are extracted out to a separate XML file as part of the compilation process. Any XML can be used as long as it's well-formed, so it's easy to support company-specific information. The generated XML can be combined with information obtained through reflection to generate documentation.

Additionally, the Visual Studio .NET IntelliSense engine presents this information as part of the coding process.

User-Defined Primitive Types

Most programming languages only supply a basic set of predefined types?types such as int, short, float and double. Using such types is second nature for most programmers, so they would have no trouble understanding the following:

int startValue = 55;
int endValue = 88;
for (int i = startValue; i != endValue; i++)
{
    int result = i * i + 35;
    Console.WriteLine("{0} {1}", i, result);
}
Console.WriteLine("start, end: {0} {1}", startValue, endValue);

It's very clear what this code does. If you need to implement the same algorithm with numbers of unlimited precision, you could use a Bignum class and write the following code:

Bignum startValue = new Bignum(55);
Bignum endValue = new Bignum(88);
for (Bignum i = startValue; i != endValue; i = i.add(new BigNum(1)))
{
    Bignum result = i.multiply(i).add(new Bignum(35));
    Console.WriteLine("{0} {1}", i, result);
}
Console.WriteLine("start, end: {0} {1}", startValue, endValue);

There are several issues with this code. The first is that it's considerably more complex than the first version, which makes it more difficult to write, code review and maintain. Bignums don't work any differently than integers but in using them we're forced to write drastically different code.

The second issue that there are two subtle bugs in the Bignum version of the code. Can you find them?

The first bug is in the completion test. This statement is a problem:

i != endValue

Since Bignum is a class, this statement doesn't compare the values of i and endValue, it compares the references. Since they're not the same instance, this statement is never true.

The second bug is in the initialization of the For loop. The following statement doesn't work the way the same statement in the int version does:

Bignum i = startValue;

Since Bignum is a class, variables of type Bignum perform reference assignment and, after execution of this statement, both i and startValue refer to the same instance. The assignment to i also changes the value of startValue, so startValue ends up being overwritten.

Writing types such as Bignum isn't an everyday programming task, but it is important that such types are easy to use and behave the way that programmers expect them to. C# provides three features that can make Bignum behave the same way int does.

First, C# allows the user to author types that have value semantics, just like the predefined types do. In the .NET world, these are known as value types, which are defined in C# using the struct keyword. Value types are allocated on the stack or as part of other objects and have value semantics. A Bignum type written as a value type prevents the bug that we had in our version. It also provides better efficiency if we have an array of those types as there isn't a separate heap allocation for each element.

The second feature that helps out is user-defined conversions. It's always a safe operation to create a Bignum from an int. Adding a user-defined conversion enables us to simplify the code. Instead of writing:

Bignum startValue = new Bignum(55);

We can simply write:

Bignum startValue = 55;

The final feature to give us “int fidelity” is operator overloading. Rather than calling methods to perform operations, we can use the existing mathematic operators. Using these three features, the code that we write for Bignum is identical to the code for int:

Bignum startValue = 55;
Bignum endValue = 88;
for (Bignum i = startValue; i != endValue; i++)
{
    Bignum result = i * i + 35;
    Console.WriteLine("{0} {1}", i, result);
}
Console.WriteLine("start, end: {0} {1}", startValue, endValue);

Being able to create new primitives enables programmers to leverage their existing knowledge resulting in superior code. In fact, the System.Decimal type in C# is implemented as a user-defined value type.

Solving Real-World Problems

One of the drawbacks of C++ is its complexity; simplicity was an important design goal for C#.

It's possible to go overboard on simplicity and language purity but purity for purity's sake is of little use to the professional programmer. We therefore tried to balance our desire to have a simple and concise language with solving the real-world problems that programmers face. We've also kept in mind the difference between class author complexity and class consumer complexity; if adding complexity in the author's world simplifies the consumer's world significantly it's worth considering.

Value types, operator overloading and user-defined conversions all add complexity to the language, but allow an important user scenario to be tremendously simplified.

We also elected to allow the user to perform some pointer-based operations from within C#. Because such operations can't be verified by the runtime to be type-safe, they are known as unsafe operations and the runtime only allows them to be executed if the code is fully trusted. Full trust is only granted to code that is local to the machine, so if such code is part of a Web page or on a network share, it won't run. The ability to use unsafe code isn't a commonly-used feature but, in some cases, it's critical to get the performance you need or to interoperate with existing code.

The .NET Runtime also provides solutions to important real-world problems. One of the perennial problems with software running on Windows is “DLL Hell,” which occurs when one version of software installs a different version of a DLL that another program depends on. The problem is solved in the .NET world by adding version information to an assembly. This allows different versions of the same assembly to co-exist “side-by-side” on the same system and programs only use the version they were built and tested against.

In addition, multiple versions of the runtime can exist on the same machine with each program using the version is was built and tested for. These side-by-side features make code much more robust.

Another important problem addressed by .NET Developer Platform is the use of existing code. The last thing most programmers want to do is to take existing, well-debugged code and port it to another language. The runtime provides interop features that enable .NET code to use existing COM components or code contained in DLLs, and the Managed Extensions to C++ enable the user to mix new managed code with existing unmanaged C++ code.

Programmer Efficiency

While we didn't want to make wholesale changes to the C++ syntax, we did do a bit of tweaking to make the programmer's job easier. One of the areas we addressed was how programmers deal with arrays and collections. One of the most common tasks is traversing the elements of an array. Here's a bit of code I've written thousands of times:

for (int i = 0; i < arr.Count; i++)
{
    string s = (string) arr[i];
    // use s here
}

As I write that loop, there are a number of decisions that I make. First, I need to choose the name for a loop index. When I choose that name, I have to do a mental check to make sure I'm choosing a name I have not used before. Next, I have to set the termination condition. I have to remember what I'm iterating over and remember how to figure out how long it is. In the body of the loop, I have to remember the array and index names again and the type of the array elements.

The only two important pieces of information are the name of the array and the type of the array elements. The rest is just busy work that the programmer has to spend time on. C# adopts the foreach construct found in languages such as Perl, which simplifies the code to:

foreach (string s in arr)
{
    // use s here
}

The foreach construct doesn't require anything extra; each piece of information is only mentioned once. This not only makes it much harder to make a mistake but it also makes it clear what the code does. Foreach also enables iterating over types such as database cursors, which have no count or method of indexing.

C# also makes it easier to use primitive types in collections. To be able to use a value type such as int in a collection, you need some way to convert it to a reference type. In Java, this is done by using a wrapper class, so that storing an int requires putting it inside an Integer class instance, and that instance is then added to the collection. In this example, four integers are put into a Vector, and then printed out:

Vector vec = new Vector();

vec.addElement(new Integer(-200));
vec.addElement(new Integer(100));
vec.addElement(new Integer(400));
vec.addElement(new Integer(-300));

for (int i = 0; i < vec.size(); i++)
{
    int e=((Integer)vec.elementAt(i)).intValue();
    System.out.println(e);
}

In C#, the same operation is performed automatically through boxing. Whenever a value type is used in a situation where the type object is required, a reference-type box is automatically generated, which simplifies the user code. In this example, there's no need to manually wrap the integer values or extract them out.

ArrayList arr = new ArrayList();

arr.add(-200);
arr.Add(100);
arr.Add(400);
arr.Add(-300);

for (int i = 0; i < arr.Count; i++)
{
    Console.WriteLine(arr[i]);
}

C# also addresses some C++ constructs that have led to common errors. A common error in C++ code is to write:

if (count = 5)
{
}

Where “=” is used instead of "==". Under the C# rules, the expression in an if statement must evaluate to true or false, so the preceding code generates an error.

Finding out more

The following are links to C# and .NET information:

Visual C# Team Community - http://www.gotdotnet.com/team/csharp

Visual Studio .NET - http://msdn.microsoft.com/vstudio/

Microsoft .NET Framework - http://msdn.microsoft.com/netframework/

We didn't want to omit features that would be useful for professional programmers merely because they could be used incorrectly.

C# provides important capabilities that make creating and using components easy.

Being able to create new primitives enables programmers to leverage their existing knowledge resulting in superior code.

Purity for purity's sake is of little use to the professional programmer.

Why is language interop important?

The multi-language capabilities of the .NET runtime have generated a considerable amount of discussion. One common argument is that it's a bad idea for a team to use more than one language, therefore language interoperability isn't important.

That argument ignores the realities of software development.

Being able to choose the right language for your development team means that they'll be more efficient and you'll have happier programmers. Many organizations already have teams that write their complex business objects in C++ and other teams that consume them from Visual Basic. Without language interop, one or both of those teams have to move to a different language with associated porting and retraining costs.

Language interop also levels the language playing field. Because the Win32 library is C based, C and C++ programmers have always enjoyed full access to that API while VB programmers have had a harder time. Similarly, C++ programmers who wanted to use components written for VB users also had a difficult time.

The existence of the Common Language Subset means that libraries written in one language can be consumed by all languages. Third-party component providers don't have to limit their market by choosing a language.

Finally, language interop enables the reuse of existing code. If you have complex, well-debugged components written in C++, you'd like to be able to use them without a costly rewrite that could introduce new bugs.

What is an assembly?

Assemblies are the unit of distribution for code written in .NET. An assembly contains the code and descriptive information for one or more .NET classes. All the information required for the runtime to use the classes is contained within the assembly.

The assembly is also the unit the runtime uses when tracking versions and handling security.

What is metadata?

A .NET assembly contains a compiled version of the program code and descriptive information about the types, methods, and other entities that are contained in the assembly.

This information is known as metadata. The runtime uses it to create and track objects allowing a programmer to query it at runtime.

Metadata makes the code in an assembly self-describing; there's no need for separate header files as the compiler reads the metadata directly. It also makes it possible for anybody to write object browsers or other introspection tools. Finally, it enables late-bound scenarios where classes are located and created dynamically at runtime.

Inside the C# Design Team

I was lucky enough to be able to work on C# language design for several years, and it was a fascinating process. There's a lot of painstaking work to figure out all the details, some disappointments when you can't come up with a good way to express an idea, and satisfaction when a syntax gels.

I was surprised to find that one of the biggest constraints is the words and symbols that you use to build your syntax. C++ has already used most of the special characters, and new uses are only possible if they aren't confusing. Keywords that are short, precise in meaning and carry the right connotation are difficult to find. Because “byte” is more commonly unsigned than signed, we need a name for the signed version and ultimately chose “sbyte”.

C#: Why Do We Need Another Language?

Published in:

Filed under: