Relational Database Persistence with NHibernate, Part 2

This article continues from the May/June 2009 issue of CODE Magazine (Quick ID 0906081) which covered why you want to use NHibernate, techniques for configuring NHibernate, how to map your objects to your data entities, and how to load basic objects.

One-to-Many Collections

You now have the ability to load a given SalesOrderHeader and load its associated Customer object. What about loading all the SalesOrderHeaders for a given Customer? That’s called a one-to-many (OTM) relationship.

Since there will potentially be many orders for a customer, you will need a property on Customer whose type will be an IEnumerable-derived type or list structure of some kind (Array, IList<T>, IEnumerable<T>, and so on). This will allow you to for-each over the list of orders for a given customer, which would be the most natural thing from the .NET code perspective.

In order to map this situation to a database structure with NHibernate, you have to choose among five different types of list structures.

Set - This is similar to the mathematical “finite set” concept (i.e., a list with no duplicates which may or may not be ordered). This corresponds to the Iesi.Collections.ISet interface from the Iesi.Collections.dll assembly shipped with NHibernate. A separate assembly is required because there is no ISet available in the .NET Framework Base Class Library. ISet is not included in NHibernate because it is beyond the scope of what a persistence framework is responsible for. Thus, ISet has its own assembly.
Bag - A list of entities which may contain duplicates and which has no specific or intentional order. This type corresponds to the ICollection/ICollection<T> or IList/IList<T> .NET types.
Array - A plain, regular .NET array of a particular type.
Map - An IDictionary/IDictionary<T> whose key and value are specified by the mapping.
List - Another form of ICollection/IList (or their generic forms) similar to bag except the list index is specified by the mapping. A real-world example of this might be line items for an order where the index is the LineItemNumber column so that, when loaded from the database, NHibernate will set up the IList ordered by the LineItemNumber.

Bag and Set are perhaps the most commonly used list mappings. I would encourage you to try the others to see what fits best with your situation.

Adding the Collection to the Entity

Start by adding a property called “SalesOrders” to the Customer class of type IList<SalesOrderHeader> (IList(Of SalesOrderHeader) for the VB folks). Make sure to make it virtual. You should also consider initializing the IList in your constructor to a default empty List<SalesOrderHeader> in case you use your Customer object before it’s saved or loaded through NHibernate.

At this point, you may be wondering: “What if I don’t want an IList directly exposed to callers? What if I want the IList private and have accessor/mutator methods like AddOrder() or RemoveOrder()?” These are valid questions and NHibernate can handle them just fine. I’d like to demonstrate the simple case first, before I get into some more complicated aspects of mapping such as mapping private fields, and so on.

Mapping a One-to-Many Collection

Since the SalesOrderHeader table has no implied ordering (i.e., a LineItemNumber column), you can safely map the relationship using a <bag> or <set>. Bags are a little easier to get started with because they don’t involve having to mess with the ISet assembly reference hassle mentioned above.

Ultimately, I recommend you seriously consider using ISet for most one-to-many situations like this because it has the semantics most situations would expect. There are some downsides to using a <bag> that may trip you up later. However, in an effort to get started quickly, let’s map the “SalesOrder” property as a bag. You can change it later if you start running into problems.

You need three things to map a collection like this:

The collection type (bag, set, map, list, etc.).
The key column on the child table that points back to this entity’s ID (i.e., the CustomerID column in the SalesOrderHeader table).
The name of the child class that will be contained in the collection (i.e., SalesOrderHeader).

The one-to-many mapping from Customer to SalesOrderHeader will end up looking like this:

<class name="Customer">
  <id name="CustomerID">
    <generator class="native" />
  </id>
    
  <!-- <property> tags here -->
    
  <bag name="SalesOrders" inverse="true">
    <key column="CustomerID"/>
    <one-to-many class="SalesOrderHeader"/>
  </bag>
</class>

A Quick Note About Bi-directional Associations

You may have noticed that I slipped in an inverse=true attribute into that mapping. This is because you’ve now mapped SalesOrderHeader to Customer as well as Customer to SalesOrderHeader. You’ve mapped the relationship both ways (bi-directional) and, without the “inverse” attribute, haven’t indicated to NHibernate that these two relationships are, in fact, the same one. Perhaps in a future version NHibernate may grow smart enough to determine this automatically, but for now the problem can get complicated when more relationships are involved, so you have to give NHibernate some hints.

Without the “inverse” attribute, NHibernate will view these mappings as two, distinct relationships and treat them as such. Thus, when adding a new SalesOrderHeader to an existing Customer, NHibernate would likely INSERT the SalesOrderHeader and then issue a separate UPDATE statement to update its CustomerID field, thus relating it to the Customer. This is excessive because NHibernate actually has all the information it needs to issue one INSERT statement with everything set up properly.

In order to help NHibernate understand your intention, you must tell it that one side of the relationship is the “primary” one. For an OTM/MTO relationship, the OTM side (Customer to SalesOrderHeader) should be the “inverse” and the MTO side (SalesOrderHeader to Customer) should be the primary.

Saving and Updating OTM Collections

Now that I have the bi-directional relationship wired up correctly, I can talk about how to properly relate the objects together through code. Again, imagine what would be required to create this association purely in memory as if there were no database to worry about it. Likely, that code would look like:

var customer = createNewCustomer();
    
for (var i = 0; i < 5; i++)
{
    var order = createNewOrder("PO" + i);
    order.Customer = customer;
    customer.SalesOrders.Add(order);
}

Not surprisingly, except for the addition of a call or two to ISession.Save(), this is exactly the code that will create the proper structure for NHibernate to persist the relationship properly to the database. This goes back to some of my original points about a good O/RM not being too intrusive into how the object-oriented code works and functions while also not imposing unnecessary requirements on the relational database side.

By default, NHibernate will not do any cascading of operations (such as deleting a related entity when you remove it from the collection, etc.). NHibernate does, however, have very powerful and configurable cascade and life-cycle management features. I’ll go into more depth on these later. Let’s take a look at what the above code looks like after I add the NHibernate calls into it:

var customer = createNewCustomer();
session.Save(customer);
    
for (var i = 0; i < 5; i++)
{
    var order = createNewOrder("PO" + i);
    order.Customer = customer;
    customer.SalesOrders.Add(order);
    session.Save(order);
}
    
xaction.Commit();

Notice that the only difference is the two calls to session.Save(). After I cover the cascading options, I’ll show you how to get rid of one of those save calls.

Retrieving the Collection

By default, when you get a Customer from the database, the SalesOrders list will be an empty proxy for lazy loading purposes. When you attempt to enumerate over the collection (call foreach() or pass the list to some method that will end up looping over the list), the proxy will trigger a lazy load and then load all the related SalesOrderHeader objects. As I mentioned before, you can configure this behavior. This is only the default.

var customer = session.Get<Customer>(customerId);
foreach( var order in customer.SalesOrders )
{
    Console.WriteLine(order.PurchaseOrderNumber);
}

Other Types of Relationships

NHibernate supports other types of relationships as well such as many-to-many, ternary, and heterogeneous relationships (i.e., many-to-any).

I won’t cover every single type of relationship in this article as it would be too long. Each relationship is very similar in its mapping and behavior to the OTM/MTO. The NHibernate online documentation has good coverage of all of these and would serve as a better reference for all the details.

A Little Many-to-Many Wisdom

I want to make one last point about many-to-many relationships which often trip people up who are using an O/RM framework like NHibernate: A many-to-many relationship is a relationship between two tables with an intermediate “join table” in the middle. This “join table” has either two or three columns in it: A column for the ID linkage to each of the joined tables and, optionally, a third column which is the unique ID of that row/join.

If you map a many-to-many relationship bi-directionally, make sure to mark one of the directions as “inverse.” It doesn’t matter to NHibernate which is the inverse, so you must choose one based on whatever makes sense to your domain and code.

You should also note that you may run into a situation where you have what appears to be a many-to-many relationship, but the “join table” has extra, non-ID data columns in it. This is not a real many-to-many relationship. It is, in fact, a ternary relationship. Another way of looking at it is two OTM/MTO relationships between the two main tables and the intermediate table (i.e., the “join table” with extra information). You can think of it logically as a many-to-many relationship, but you should model it as a ternary relationship.

More Complex Operations

Up until now, I have only talked about basic, standard operations with NHibernate and haven’t talked too much about some of the more interesting features that NHibernate brings to the table. I touched briefly on transparent lazy loading which, in and of itself, is a really nice feature. But I haven’t touched at all upon cascading operations, fetching strategies, querying, projections, aggregations, or any of the advanced patterns or best practices. In the next few sections, I’ll start getting into these more advanced topics.

Lazy Loading

NHibernate, by default, will lazy load all related objects and collections. You can turn this off entirely by setting the “default-lazy” attribute to “false” in the <hibernate-mapping> element in your HBM XML file(s). You can also override the default-lazy setting on a case-by-case basis by setting the “lazy” attribute to “true” or “false” on the <class>, <bag>, <list>, and a few other mapping tags.

NHibernate uses the Castle.DynamicProxy2 project to generate its lazy load proxy classes. These proxies will intercept any call and trigger NHibernate to retrieve the object or objects represented by the proxy.

In order to lazy load an individual referenced type (i.e., the Customer object in the case of the SalesOrderHeader to Customer relationship), the class must be able to be subclassed (that is, it cannot be private, internal, sealed, etc.) and must have a protected constructor (if not public). All the mapped properties must be marked as virtual (Overridable in VB) in order to allow the proxy to override them and detect when there is an attempt to access data from the proxy (thus triggering a database retrieval).

It would be nice if NHibernate didn’t have this virtual requirement as it is a small indication of a violation of persistence ignorance (one of my requirements of a good O/RM). However, this violation is minor in my opinion and does not require too much compromise in my object design so I’m willing to accept it. I’m willing to accept it especially given all the other functionality I get for this small compromise.

A Warning About Lazy Loading and Performance

Lazy loading makes things very easy to get up and going but I would be remiss if I didn’t mention that it can also lead to performance problems if not properly analyzed. For small projects without a lot of performance concerns, you can safely use lazy loading with impunity. For larger projects with high load-where index usage and cache hits are very important, lazy loading can backfire on you and cause more problems than it solves. Before your project gets too far along, you should consider turning on NHibernate’s SQL logging features and analyzing the SQL it’s generating and when it’s executing statements against the database to make sure that it isn’t causing an undue burden on your database.

The reason I don’t advocate doing the analysis up front or as you go is because most of the problems that you will run into you can fix by simply adjusting the mappings slightly without having to modify the code too much. Also, obsession about performance early on in an average line-of-business application project is a known anti-pattern and will usually set you back more than it helps you.

Just remember to keep in mind that NHibernate does not give you an excuse to ignore database performance optimization-it just makes it easier and localized to a single point of configuration.

Cascading Operations

Another feature of NHibernate that enables transparency of the underlying persistence mechanism is the cascading of operations from a parent to its related child objects collection.

By default, NHibernate will not cascade any operations from a parent entity to a child entity. You can change this default by setting the “default-cascade” attribute on the <hibernate-mapping> element. I don’t recommend doing this until you get familiar with how cascading works and are more aware of the ramifications of this setting.

If you are mapping a relationship one way (i.e., just the many side of an MTO), you can set a value for the “cascade” attribute to control how actions performed on the parent (the “one” side of the MTO) are cascaded to the child (the “many” side).

If you’re mapping a relationship bi-directionally, you should set the “cascade” value on the parent to child mapping (for example, the <set> element’s <one-to-many> child element).

There are several options available for the value of the “cascade” attribute. They have important behaviors and consequences so read carefully before choosing the one that’s right for any particular relationship:

none : Default. NHibernate will not cascade anything.
save-update : Saves (inserts) and updates performed on the parent are cascaded to the children (each unsaved or “dirty” child will be saved/updated).
delete : Only a delete operation performed on the parent will be cascaded to children.
all : Saves, updates, and deletes will be cascaded from the parent to the children. NOTE: If a parent loses its reference to a child, the child may become orphaned in the database. This is why the all-delete-orphan option exists.
all-delete-orphan : Similar to all, but also cleans up orphaned children.

When you map a parent to children using the “all” or “all-delete-orphan” cascade option, you call the parent a “lifecycle object” because its entire lifecycle is cascaded to its children. Master/Detail relationships (i.e., Order/LineItem) are frequently lifecycle objects. Lifecycle objects are important because they generally behave the most naturally (the way you would expect a master/detail object structure to behave) and don’t require much extra coding, if any, to satisfy the underlying requirements of the persistence mechanism for cascading.

var customer = session.Get<Customer>(customerId);
// When mapped with all-delete-orphan
// NH will delete all associated orders, too
session.Delete(customer);
xaction.Commit();

Database Sorting, Filtering, and Batching for Collections

In some cases, when retrieving entities from the database, you may wish to have the database do some extra work. A common example of this situation is after you have used an application for a few weeks and have noticed NHibernate is doing some things inefficiently and you want to optimize it a little. Three common types of problems and three solutions you might consider are: select n+1 and batching, ordering at the database, and filtering at the database.

Selectn+1 and Batching

In the case of Customer and SalesOrders, you may notice that you routinely load a number of customers and then iterate over each Customer and over each Customer’s sales orders (for example, displaying them in a two-level grid on a screen). This will result in one or more SELECT statements to retrieve the Customer records and then one SELECT for each SalesOrderHeader associated with the Customers. In order to combat this problem, you can instruct NHibernate to batch Customers and SalesOrderHeaders. Let’s say, on average, you load about 10 Customers at a time. Set the “batch-size” attribute value on the Customer <class> mapping to “10.” Let’s also say that each Customer has an average of three associated SalesOrderHeader records. Set the “batch-size” attribute on the <set> or <bag> mapping to “3.” After doing this, you should notice that NHibernate will load ten Customers in a single SELECT statement as well as three SalesOrderHeader records in a single SELECT statement the first time you access the SalesOrder collection on a given Customer object.

Ordering at the Database

It may be the case that you end up retrieving a list of SalesOrderHeader objects associated with a given Customer object frequently and then sort them the same way every time. Except you sort them in memory after having retrieved them from the database and you find this to be inefficient (i.e., perhaps you get more index misses). You can set a default database ordering (i.e., the ORDER BY clause in the SELECT statement) by using the “order-by” attribute on the <set>, <bag>, and other collection-type elements. For example, you may want to order by OrderDate descending. Set the value to “OrderDate desc.”

Filtering at the Database

Let’s say that you have a requirement that, when sales orders are deleted, you don’t actually delete them from the database, but simply set their “IsDeleted” or “IsArchived” property to True or 1 or some such and then they should never show up again in the application. They may still be there in the database, however, but the application should consider them deleted for all purposes. In this case, consider setting the “where” attribute on the <set>, <bag>, etc., element to filter those records out permanently from your application. For example, set its value to be “IsDeleted = 0.”

NOTE: This is not how you generally accomplish basic WHERE clause functionality when querying. Use this only for a situation where you need a specific subset of the data actually in the database and the application should never see all the other data. This is useful for situations like the one mentioned above, or if you partition data in a single database for multiple uses (i.e., sales data by region).

Conclusion

In this article I have covered some of the basic reasons why you might want to use an O/RM as well as the important aspects or features to look for in a high quality O/RM framework. I have also covered getting started with NHibernate including setting up a simple data and object model. I mapped the two together, and performed some basic CRUD operations with transparent lazy loading and cascading operations.

There is still quite a lot to cover around NHibernate such as querying support via the ICriteria interface, usage patterns such as the session-per-request for ASP.NET Web applications as well as the Unit of Work and Repository patterns. Perhaps I’ll cover these in a follow-up article but in the mean time, you can learn more about these advanced topics at the NHibernate Web site linked earlier in this article or visit some of the many excellent FAQ and community learning resource Web sites such as the “Hibernating Rhinos” blog:

http://blogs.hibernatingrhinos.com/nhibernate

Finally, I would be remiss in my duties if I did not mention one of the great NHibernate-related projects that has helped many developers who are just getting started with NHibernate to get going quicker: Fluent NHibernate. This project’s aim is to reduce the friction and common mistakes involved in using the HBM XML for mapping objects with NHibernate. They do this by creating the mappings in code using a .NET API and a technique known as static reflection. The mapping is then compiler-enforced, which helps when refactoring your domain. You can learn more about Fluent NHibernate at its home page:

http://fluentnhibernate.org/