In show #270 Richard and I talked to Erik Meijer from Microsoft about LINQ. In this excerpt we talk about LINQ to Entities.
Carl Franklin: What was your hand in [LINQ]? I mean, what was your role?
Erik Meijer: I was both in the VB and the C# design team so I kind of worked with both teams on VB 9.0 and C# 3.0. Before that, when I was an academic, I already did a lot of research in this area. As you mentioned in the introduction, I designed a number of languages that attempted similar goals that LINQ has with data programming support directly into the programming language instead of going via libraries or APIs.
Carl Franklin: Was it a high level involvement or did you actually get to roll up your sleeves and do the work?
Erik Meijer: Well, I was in the design team so that was like, for example, for C# that means three afternoons a week where we’re kind of designing the language and all nitty-gritty details. For VB, that’s two afternoons a week where we do the same. I also had a team here that did the XML integration in VB. That was like two mornings, so I kind of spent nearly all of my time on the design. I didn’t write any code in the actual compiler or things like that. I wrote a lot of demos, but it [was] mainly design.
Carl Franklin: Okay. Let me just share with you my observations about developers in the community and some of the old-timers that I know. The question comes up, “What do you think about LINQ?” and I’ve noticed, and tell me if you find this as true also, that the programmers who are like one-man shops, they work on small projects and maybe research projects for talks and things like that, they’re probably not getting [as much] excitement around LINQ, but those who are developing large enterprise applications where there’s a lot of complexity and there are a lot of objects to manage, that’s where they really, really think that LINQ is a godsend. You think that’s a fair assessment?
Erik Meijer: Well, I think either the small shops probably also should benefit or will benefit a lot from LINQ because the way I look at it, you know, if you look at SQL programming as you do today, if you’re using, say, ADO.NET or some existing framework as it currently ships where you have to write SQL strings by hand, I think you already benefit even if your applications are fairly simple and don’t involve thousands of tables and things like that. Anybody that writes a SQL thing, as a string, and gets back a dataset or something like that I think is better off using LINQ or LINQ to SQL or LINQ to Entities. If you’re dealing with XML, [and] I think the joke is getting old, but DOM in Dutch means brain dead. Everybody uses XML and I think that’s kind of getting so much easier with LINQ than before.
Carl Franklin: Well, I’ve actually got a story about that, and it’s funny you brought it up. Just this morning we had an e-mail server meltdown here, and it was a planned meltdown, but we basically stopped using [an in-house mail server] and moved over to a hosted system. So, in doing that, I was left with these people who had contacts in an IMAP XML file and no real way to move them up to this new system, so just having done a dnrTV episode or two with Don XML on LINQ to XML, I just picked out the sample code and followed it very easily and was able to move all that data into a CSV file in about half-an-hour.
Erik Meijer: Yup.
Carl Franklin: And it worked, and it was easy and great and I thought to myself, “I would never have done this with XPath.” I just don’t know enough about it to be able to do it.
Richard Campbell: And it strikes me that this is the strength of LINQ that one querying methodology and it’s the same whether I’m talking to SQL or I’m talking to XML or some other chunk of data.
Erik Meijer: Yup. That’s exactly that kind of power. So, the way I usually explain it is that if you look at the way that query languages and data models are currently in silos, you have SQL or relational data that has SQL, then you have XML that has XPath or XQuery or XSLT, even if you’re trying to write object-oriented programs over collections, you have to write ad hoc queries with loops and things like that. The nice thing [about] LINQ is that it factors out the commonality between all those data models instead of emphasizing the differences. The thing is that if you look at all those data models, there’s a lot of commonality because they’re all dealing with collections of things. What LINQ gives you is the LINQ Query Expressions or Query Comprehensions. Those allow you to express the operations on these collections and the data model doesn’t really matter that match because all the queries are kind of at this higher level of abstraction. So, you only have to learn how to formulate queries over collections and then it will work over anything.
The thing is then that [LINQ] maybe kind of benefits the small shops even more because there, you have to do all this with a limited amount of people so you cannot afford to have people that specialize in XPath or XSLT or something like that. So, just knowing C# or VB gets you way, way further than before.
Richard Campbell: That’s a good explanation of it. It’s exciting to me. As a data guy, I looked at things like the idea that I would be able to take two collections from different sources, say, multi-select select box or a dropdown box where I’ve selected six or seven different items and that is actually used as selection criteria to pull data from a database and I could write this as a single expression in LINQ, really as a join.
Erik Meijer: Yup.
Richard Campbell: The only thing that makes me nervous then is how is that actually implemented? How smart is it to do that efficiently? Would it actually just pull all the data from the database and then filter out that which it didn’t need based on the join? How’s it going to work that out? When I hand code stuff like that, I would actually create a connection to the database, create a temporary table, fire those half-dozen rows from the selection up, and then do all the work on the database, but then I’m a database geek. That’s how I would do those things.
Erik Meijer: Yup. Well, that’s a very good question. The thing is, LINQ is no silver bullet, right? So, it’s not suddenly, you know, magic happens.
Richard Campbell: Right.
Erik Meijer: You still have to be kind of mindful where things execute, whether they execute locally or remotely and things like that. There’s no kind of magic there. So, I really expect that people will often inspect LINQ to SQL or LINQ to Entities to kind of understand what really goes on where. The nice thing is that you can now formulate your problem at the higher level of abstraction and so a lot of the kind of noise is taken care of, but you still have to think about the efficiency of your program and where things execute. If you can do some filtering on the database that’s probably more efficient than bringing huge amounts of data to the client and filtering it there and things like that. In that respect, things don’t change, but you have to write less and at a higher level of abstraction. You gain a lot by that.
Richard Campbell: So, given the scenario I just painted there, would the solution be to not use LINQ there or is there a way to go underneath LINQ and say, “And I want you to do it this way?”
Erik Meijer: Oh, no. Definitely, you can do this with LINQ and if you write your query right, you can get exactly what you want where things are executed on the database and who to join over there.
Richard Campbell: Right. So, there is a way for me within LINQ to say, “I want you to execute this join on this server rather than do it here.”
Erik Meijer: Yes. The thing is it’s not all that explicit so you have to look at your query and the way you write your query, so it kind of depends on how the queries were formulated, and we’re kind of going a little bit into kind of very deep details here, but usually when you’re in the first data source that you’re selecting from, that is where the query is executed.
Richard Campbell: Ah. Interesting, but I think these are exactly the things that people want to know.
Carl Franklin: Yeah, I think so.
Richard Campbell: LINQ is a big enough umbrella that you can use it for all of your querying requirements and are still going to be able to drill in and tune and to find a way to solve that performance problem. You know, 90% of my queries or 95% of my queries in LINQ are going to run just fine and the few that don’t, I don’t have to recode them, I just have to tweak them.
Erik Meijer: Yes. The nice thing is also in case we give you a lot of kind of hooks and so on that in case you really need to tune them that we give you a way out. For example, you can map a lot of things to stored procedures or we give you hook points where you can bypass the kind of standard translation. So, in case you really want to get control, you can do that.
Richard Campbell: Cool.
Carl Franklin: Now, you can do that with attributes on your Entity objects. Is that true?
Erik Meijer: Yes. There’s kind of a lot of [ways] you can influence the mapping to do that. That’s correct.
Carl Franklin: That’s pretty awesome that you can just build these Entity classes and then decorate them with attributes that says, “Oh, by the way, this property is associated with this field in this database with this key.”
Erik Meijer: Yup. Again, the way I look at it is whenever you write code, there are two main aspects. There is the plumbing and so on that you have to do and then there’s the real formulation of the algorithms in solving the problem.
Carl Franklin: Right.
Erik Meijer: LINQ takes care of a lot of that low-level plumbing so that you don’t have to think about that, but in certain situations you will have to do a little bit of the plumbing and we don’t take that away from you.
Carl Franklin: But with a good code generator, you can alleviate some of that pain.
Erik Meijer: Yes.
Carl Franklin: Tell us about the data context class. What is that? [Is] this is a new thing in LINQ to SQL?
Erik Meijer: Yes. The data context class, in some sense, I would describe that that’s the abstraction of your database. That’s the client-side object that represents the remote database. From there you can access tables, transactions, your connection string, etc. So, it kind of encapsulates your database.
Carl Franklin: Okay. So, this is just kind of what it says, it’s a context that identifies what your database is and abstracts it away.
Erik Meijer: Yes.
Richard Campbell: So, then I am able to refer to that and it will pull data as needed?
Erik Meijer: Yes. Again, that depends. The data context, it does a lot of things. For example, when you say, “Will the data be loaded?” that’s just one thing that you can set on the data context whether you want to have eager loading or lazy loading. This is kind of one of the things that you can control on there. Another thing that the data context does is change tracking, so you get an object from the database into memory and you do another query that will return the same object or row that represents the same object, it will keep track of that and will return the same object. It will do change tracking for you such that when you change the object and then you want to submit the changes back to the database, the data context will know what to update in the database. Data context is the heart of LINQ to SQL and also kind of LINQ to Entity has a similar mechanism there.
Carl Franklin: That’s what you use to do your CRUD basically, right? You submit changes, for example.
Erik Meijer: Yes.
Richard Campbell: But I can also see where being able to say these are type tables and they’ve only got a few hundred rows in them and I want them eagerly loaded where this is the order table and it’s enormous, I’m really only ever going to write to it so I don’t need that thing loaded. Being able to make those different specifications mean that so much is going to run quickly now because you’re pre-loading the right things. You’re able to execute locally a lot of information at your fingertips rather than having to keep going back to the database and retrieve it over and over and over again.
Erik Meijer: Yup, that’s correct. So, here’s one way you can look at this. The fact that this data context encapsulates a lot of functionality and this is what you would normally have to write by hand and everybody would have to write this by hand and these things, these are all quite subtle things like change tracking or eager loading, lazy loading, how do you do that, if you want to do eager loading can you filter stuff that you bring in, and so on. How do you traverse relationships, all of that is all kind of done by the data context and influenced by mapping. The query translations, so you write your LINQ queries as expression using these query comprehensions and then the data context takes those and translates them to SQL. So, it’s like enormously powerful and that’s where you get the added value.
The conversation continues online at http://www.dotnetrocks.com/default.aspx?showNum=270