At PDC 2005, Microsoft announced a new technology called Language Integrated Query (LINQ), which will be available with Visual Studio “Orcas” (the next version of Visual Studio). A lot of exciting new technologies are announced at every PDC, and as a result, LINQ got some attention, but not nearly as much as I think it deserves. LINQ represents the ability to run queries right inside of Visual Basic, C#, or any other .NET language.

Consider this Visual Basic .NET example:

select customer from customer in customers
    where country = "USA" orderby state

This statement assumes that there is a collection or list of items in memory called “customers.” The statement selects all customer objects within that customers list where the country-property is set to “USA” and returns them as a new collection. The items within that collection will be ordered by state, which is another property on each customer object.

C# supports the same functionality with very similar syntax:

from customer in customers
    where country == "USA" orderby state
    select customer;

The main difference here is that the “select” keyword (and its arguments) is placed at the end of the command.

The LINQ query language is native to C# and VB.NET, although as you can see, it is inspired by standard SQL syntax. However, there are a number of differences and it is important to understand this is not SQL and there are differences in syntax as well as functionality. For instance, all the expressions used in LINQ are native language expressions and not SQL-style expressions. In T-SQL, one can select records where a field starts with a certain character sequence using the following where clause:

where LastName like 'A%'

The LINQ equivalent would be a standard .NET expression syntax using either default expressions, or methods and properties as in this example:

where LastName.StartsWith("A")

You can use all .NET expressions, objects, methods, and properties in LINQ. For instance, if you have a list of customer objects where each customer object has a method called “IsPremiumCustomer()” you could use it in the query expression:

select customer from customer in customers
    where customer.IsPremiumCustomer()

Some people commented to me that the “from x in y” syntax seems unusual, since in T-SQL, you simply state “from x”. The difference here is that in T-SQL, whenever you run a query “from Customers”, it is pretty clear that inside that Customers table you will find a set of customer records, and all expressions needed in the query involve field names from those records. In .NET, on the other hand, it is not clear what type of object is inside a collection and how to access properties on these objects. If you compare the “from” clause with for-each statements, things become a bit clearer. If you want to iterate over a list of customers using for-each, your code might look similar to this example:

foreach (CustomerClass customer in customers)
{
   if (customer.IsPremiumCustomer())
      // Do something
}

The “from x in y” syntax allows for the same type of functionality and for the flexibility needed in a rich language such as C# or VB.NET.

Added Possibilities

One of the major differences between LINQ and a query language such as T-SQL is that .NET does not have any understanding of standard data concepts such as tables and columns. Instead, everything in .NET is based on objects, properties, and methods. Therefore, both the data source you query from, as well as the result set, are objects. This is rather interesting in itself, since it means that you can use this query language against all .NET objects and not just what you would conventionally think of as “data.” For example, you could query all textboxes on a form that have a certain value, and join them with objects from a collection which you then “union” with a result-queried from an XML document.

Similarly, LINQ can return all kinds of objects and not just records or arrays or DataSets. While you can query data from a DataSet and return individual records, you can also return completely different things. For instance, the following query returns a new object with three properties called “Company”, “Contact”, and “City”:

from cust in customers
    where cust.Country == "USA"
    select new
    {
       Company = cust.CompanyName,
       Contact = cust.ContactName,
       Cust.City
    }

This seemingly innocent select statement causes quite a bit of “magic” to happen. The select clause uses the new keyword to instantiate a new object for each returned “record.” However, the type (class) of the object that is to be instantiated is not named. Instead, this example uses C# 3.0 syntax (for more info on C# 3.0, see my last column at http://www.code-magazine.com/Article.aspx?quickid=050123) to create a new object with the three properties and set their default values to the values provided. This feature is known as “anonymous types.” The result is a collection of objects that have the three properties you are interested in. You do not know the actual type of those objects, and you do not really care, but you get them as a result set and you can use them from that point on. This is identical to first creating the following class which is then used and initialized for each resulting “row.” To do the same in conventional code you might use something like this:

public class CustomerResult
{
    private string _companyName;j
    private string _contactName;
    private string _city;
    
    public string CompanyName
    {
        get { return _companyName; }
        set { _companyName = value; }
    }
    public string ContactName
    {
        get { return _contactName; }
        set { _contactName = value; }
    }
    public string City
    {
        get { return _city; }
        set { _city = value; }
    }
    
    public void CustomerResult(
        string company, string contact,
        string city)
    {
        CompanyName = company;
        ContactName = contact;
        City = city;
    }
}

You could then use this new class and have it instantiated and populated as the result of the query (one object for each resulting “row”):

from cust in customers
    where cust.Country == "USA"
    select new CustomerResult(
        cust.CompanyName,
        cust.ContactName,
        cust.City );

As you can see, the version with anonymous types (and direct value initialization) as it will be available in C# 3.0 is rather more convenient.

Here is a completely different example of a query returning a non-conventional result set.

from cust in customers
    where cust.Country == "USA"
    select new CustomerEditForm(cust.ID);

This example creates an instance of a class called “CustomerEditForm.” Assuming that you’ve created such a class that is a Windows Form that takes an ID constructor-parameter to load customers for editing, this query launches edit forms for each and every customer from the US.

The Accidental Magic of Extension Methods

Microsoft will implement much of LINQ through another .NET 3.0 mechanism known as an extension method. Extension methods are static methods that (when in scope) attach themselves to other objects. For example, you can write a “Where()” extension method. Whenever the class that contains that method is in scope (by way of a “using” statement), all objects that do not already have a Where() method with the same signature (and only those!) automatically get the Where() extension method.

LINQ relies heavily on this mechanism, since all LINQ statements are really turned into standard object syntax before they are compiled for real. Consider the following command:

from s in names where s.Length > 10 select s

The same command can also be spelled out in object notation:

names.Where(s => s.Length > 10).Select(s => s);

These two statements are functionality-wise identical. Not surprisingly, that object notation is what the compiler really creates behind the scenes. (If the parameters puzzle you, they are Lambda expressions, which are an evolution of C# 2.0’s anonymous methods).

The trouble with this is that not all objects in .NET have Where() methods, and that is where extension methods come into play. Extension methods basically make LINQ “tick.” Interestingly enough, extension methods also have the interesting side-effect that every single object now has bits and pieces of LINQ attached as individual methods. For instance, if you have an array of strings and you want to pick a number of values from it, you could do something like this:

myArray.Where(s => s.Length < 5);

The result of this will be a list of strings from your original array, but only those that have a length of less than five characters. Similarly, you can use other LINQ methods as well. Perhaps you want to apply grouping to a list of objects, or maybe you would like to join two different sources of objects? No problem! Just pick the bits and pieces of LINQ that are of interest to you.

Another interesting aspect of extension methods and how they are used in this case is that classes that already have methods with the same signature will not get the extension methods. Instead, the existing methods will be used. This enables the developer to override how parts of a LINQ expression evaluate. You don’t like what the default where clause does with your objects? Well, just write your own Where() method and you are all set!

D-LINQ and X-LINQ

Of course, data doesn’t just exist in .NET memory. Two other important places where you’ll find data are databases and XML documents. This is where D-LINQ and X-LINQ shine. D-LINQ provides special objects in addition to the standard LINQ objects that allow querying straight from the database. D-LINQ allows for data mapping through a simple class-based mechanism. All query expressions are then dealt with as an “expression tree,” which allows any LINQ expression to be converted into a different equivalent expression such as T-SQL. Using this technique, the following C# code snippet can actually perform a native query in SQL Server:

from c in customers
    where c.LastName.StartsWith("A")
    select c

By way of an expression tree, this C# query gets translated into a valid T-SQL query which then executes on the server. C# developers never need to learn the native T-SQL syntax. Also, think of the possibilities this opens up for CLR stored procedures!

X-LINQ is the D-LINQ equivalent for XML. Using X-LINQ, you can query all customers whose last name starts with an “A” in the following fashion:

from c in customerXml.Descendants("Customer")
    where c.Element("LastName").Value.StartsWith("A")
    select c

I think X-LINQ is more powerful than D-LINQ, because in addition to querying XML, it can also create XML. (A full explanation of X-LINQ and D-LINQ is beyond the scope of this article. However, these technologies will be explained in detail in my upcoming LINQ article in the printed version of CoDe Magazine).

Conclusion

There really isn’t a conclusion at this point. LINQ will be a powerful query language, but it is also so much more. At this point, I can see just some of the possibilities appear and it will be years before developers will be able to determine the true potential of LINQ and its variations and incarnations. At that point, we will be ready to come to a conclusion. For now, I’m just excited and enthused. And I’m a bit bummed, because it will be a while before Microsoft will release LINQ, which will presumably happen in the “Orcas” timeframe (the next version of Visual Studio). Except some in-depth LINQ articles in future issues of CoDe Magazine.