SQL Server Katmai, now officially announced as SQL Server 2008, introduces a significant amount of new and improved functionality, including enhanced data types and greater programming flexibility for database application developers.

In this article I’ll look at the range of data access technologies available for leveraging the power of SQL Server 2008. After I show you how to select the right technology for your needs, I’ll dive into the improvements, first examining a new construct for programming data. Then I’ll turn to expanding the set of building blocks-data types-available to developers. Taking a brief pause, I’ll discuss how SQL Server 2008 removes limitations for some scenarios. Then building on what the article has covered, I’ll examine new out-of-the-box solutions for common application scenarios. The article will hopefully leave you with a new understanding of the possibilities for taking advantage of Katmai: SQL Server 2008.

Choosing a Data Access Technology

SQL Server offers a wide range of data access technologies to developers. The best place to start the discussion on taking advantage of new programmability features is how to choose the right one. For new application development, using the .NET Framework and specifically the SQL Server provider in ADO.NET, System.Data.SqlClient, will be the best choice in most cases. If you’re working within an existing application with business logic and associated data access code in native C/C++, you can choose from the SQL Server Native Client ODBC driver or the OLE DB provider. Both options allow you to take advantage of the full set of features of SQL Server 2008 and the choice will usually be based on your application requirements.

Additional data access options include Windows Data Access Components (WDAC)-new in Windows Vista, previously named Microsoft Data Access Components (MDAC)-and the Microsoft SQL Server 2005 JDBC Driver for Java environments. For purposes of this discussion, I’ll focus on SqlClient and SQL Server Native Client; for more information on these and other data access technologies, visit http://msdn.com/data.

To get the most out of the new functionality, you’ll need to use .NET Framework 3.5 or SQL Server Native Client 10.0 which works side-by-side with previous versions. One important takeaway is that you don’t have to rewrite your application from the ground up to take advantage of SQL Server 2008. Instead, you can gain significant value from incremental changes to your existing data access layer. With that said, let’s dive in.

New Programming Constructs: Table-Valued Parameters

One of the most requested new features by developers is the ability to cleanly encapsulate tabular data in a client application, ship it to the server in a single command, and then continue to operate on the data as a table in T-SQL. The simplest such use case is the long-desired array data type. Traditionally, applications solve this need by doing one of the following:

  • Defining stored procedures with large numbers of parameters and pivoting the scalar parameter data into rows.
  • Using an out-of-band bulk insert mechanism like SqlBulkCopy in ADO.NET to create a temporary table.
  • Using parameter arrays in the application and repeatedly executing logic that operates on a scalar “row” of data.

None of these solutions is ideal. The pivoted parameter solution, in addition to being ungraceful, creates code that is difficult to maintain that’s also tough to move forward when the time comes to add a new “column” to the conceptual “table.” The bulk insert solution is clumsy when the application needs to do any additional filtering or apply more complex logic. The parameter array solution, while it may perform well for small data volumes, becomes untenable for larger batches both on the client where memory consumption may become a problem and on the server where per-row invocations of the procedure provide non-optimal performance.

Table-valued parameters (TVPs), believe it or not, address all of these problems. TVPs provide an improved programming model and significant performance benefits in certain scenarios.

Imagine a simple order processing scenario. Using TVP starts with defining a table type in T-SQL on the server:

-- TSQL to CREATE a TABLE TYPE tt_OrderItems
    
CREATE TYPE tt_OrderItems AS TABLE (
  [ItemId] int NOT NULL,
  [Quantity] int NOT NULL)

Next, create a stored procedure that uses the table type you just created and additionally takes the customer who placed the order as a parameter:

-- TSQL to CREATE a PROCEDURE sp_AcceptOrder
-- that performs set-based operation on TVP
    
CREATE PROCEDURE sp_AcceptOrder (
  @CustomerId int,
  @OrderItems tt_OrderItems READONLY)
AS
INSERT dbo.AcceptedOrders
    SELECT O.ItemId, O.Quantity
    FROM @OrderItems AS O
       INNER JOIN dbo.Items AS I
       ON O.ItemId = I.ItemId
    WHERE I.BackOrdered = 0

This example is fairly simple, but it illustrates a big win for developers, which is that if you can update an application to implement its business logic to use set-based operations on a batch of data, it should see significant performance gains.

Here are the application changes needed to use table-valued parameters. When using table-valued parameters, client applications generally have two possible programming models:

  • Bind in-memory table data as a parameter. This is usually the simplest and fastest to code at the expense of not being as scalable in the application for large batches of data due to increased memory consumption.
  • Stream row data to the provider from a disk or network-backed data source. This model takes a bit more code to set up with the advantage of having a fixed memory usage profile in the application.

For small batches of data, the performance difference on the client will usually be negligible, so choosing the simpler programming model may be your best choice. On the server side there will be no performance difference between the models.

Diving into the details for a moment, ADO.NET accomplishes the first model by extending the SqlParameter class to take a DataTable as a value. DataTable’s ubiquity in data application programming makes it an ideal choice for this simple model. SQL Server Native Client OLE DB accomplishes the same model by leveraging the COM IRowset interface and by introducing an implementation that allows buffering. SQL Server Native Client ODBC uses a similar method modeled after parameter arrays where applications allocate and bind arrays of buffers. For the streaming model, ADO.NET supports specifying DbDataReader and IEnumerable<SqlDataRecord> as a parameter value, which provides a solution for both external and user-defined data sources. In much the same vein, SNAC OLE DB accepts a custom IRowset implementation and ODBC builds on the data-at-execution (DAE) paradigm by accepting DAE table-valued parameters. All providers also expose a rich set of services for discovering and describing table-valued parameter metadata.

When you can update an application to use table-valued parameters, it should gain the benefits of having cleaner, more maintainable code in both the client and server application tiers; faster performance, particularly when you use set-based operations on the server side by leveraging the power of the SQL query processor; and better scalability for large data volumes.

When you can update an application to use table-valued parameters, it should gain the benefits of having cleaner, more maintainable code in both the client and server application tiers; faster performance, particularly when you use set-based operations on the server side by leveraging the power of the SQL query processor; and better scalability for large data volumes. In addition to enhancing the programming model, SQL Server 2008 introduces new intrinsic data types to better align with the precise needs of applications.

New Building Blocks: New and Enhanced Date & Time Types

In previous releases of SQL Server, the database exposed date and time data primarily through two data types: DATETIME and SMALLDATETIME (Table 1). In SQL Server 2005, these types began to show their age and developers began to bump into their limitations.

You can break down problems with the existing date/time types into five categories:

  • Applications that work in terms of only date or only time data must implement a layer of abstraction on the server data often writing their own validation routines. While feasible to accomplish, this increases the burden on developers.
  • Table column storage requirements for either date or time only are considerably less than combined date and time. This means that as the size of a database increases, storage costs increase at a rate faster than necessary.
  • Existing ranges are often not large enough to represent the data that applications need to handle (like process control and manufacturing).
  • You cannot commonly represent time-zone data. Some applications choose to work around this by defining an additional column for the offset and storing date/time in UTC. Along with that solution comes the baggage of performing all the necessary calculations to treat this as a single type and the inability to straightforwardly take advantage of built-in date/time functions in SQL Server.
  • Many other database vendors (and the ANSI SQL standard) support unique date, unique time, and time-zone aware date/time types, such that, migrating an application using this functionality to SQL Server was sometimes a cumbersome process that might even require changing application requirements.

In order to address these problems, SQL Server 2008 introduces support for four new types: DATE, TIME, DATETIME2, and DATETIMEOFFSET; along with a rich set of built-in T-SQL function support for the new types:

  • DATE. Provides a broader range of day value than DATETIME by starting at 1/1/1 rather than 1/1/1753, and provides better storage scalability.
  • TIME. Also provides storage scalability over existing types and introduces user-defined, fractional-second precision from 0 to 7. In other words, based on your needs you can define a table column of type TIME to be accurate to the second or 100 nanoseconds and you’ll pay only for the accuracy you need in storage costs.
  • DATETIME2. A composition of DATE and TIME in that it supports both a wider range and a user-defined, fractional-second precision to 100 ns.
  • DATETIMEOFFSET. Includes all of DATETIME2 and additionally introduces time-zone offset support, which should significantly reduce the amount of custom code you need for time-zone-aware applications.

As you can see, all new types support a broader range, and where appropriate, user-defined, fractional-second precision (Table 2). This allows developers to tune the storage size of columns to exactly fit application needs, which for large databases, translates to significant savings in storage costs. Now let me discuss how data access stacks expose these types.

All new types support a broader range, and where appropriate, user-defined, fractional-second precision. This translates to developers being able to tune the storage size of columns to exactly fit application needs, which for large databases, translates to significant savings in storage costs.

The new SQL Server types DATE and DATETIME2 correlate to existing types in all data access technologies. For ADO.NET this is DateTime for both, for ODBC it’s SQL_DATE and SQL_TIMESTAMP respectively, and for OLE DB it’s DBTYPE_DBDATE and DBTYPE_DBTIMESTAMP. The new SQL types TIME and DATETIMEOFFSET were more difficult to express in some cases because the conceptual type didn’t already exist. While you could map TIME to TimeSpan in ADO.NET, Microsoft needed to invent new provider types in ODBC and OLE DB (SQL_SS_TIME2 and DBTYPE_DBTIME2) because existing types don’t support fractional-second precision. As you’d guess, these match their pre-existing types in every way with the addition of a fraction component matching the component that already exists in SQL_TIMESTAMP/DBTYPE_DBTIMESTAMP. The server type DATETIMEOFFSET is unique in that, in addition to containing all date/time parts of other types, it also includes a time-zone offset. To accommodate this type, Microsoft introduced new types across the board. The .NET Framework 3.5 release includes a new system type conveniently named DateTimeOffset while OBDC and OLE DB introduce SQL_TIMESTAMPOFFSET and DBTYPE_DBTIMESTAMPOFFSET; all of which should look familiar to people used to working with their non-time-zone-aware equivalents. These new types are first-class citizens in every way-with the goal of being better data-type alternatives for new application development and replacement options for application enhancements. In addition to introducing new data types and programming constructs, SQL Server 2008 also removes limitations on existing types to open the door to new application scenarios.

Breaking Barriers: Removing Size Limitations

Here is a close look at new support for large common language runtime (CLR) user-defined types (UDT) and the introduction of storing large object (LOB) column data transactionally in the file system.

Support for .NET Framework CLR UDTs first appeared in SQL Server 2005 with the goal of providing database application developers a more flexible programming model. UDTs gave developers a way to create complex types and also to express and encapsulate computationally expensive business logic alongside that data. Since the release of SQL Server 2005, customers have been adopting the technology and using it in interesting ways; however, the current 8,000-byte maximum size has limited the set of scenarios UDTs can address.

SQL Server 2008 removes this restriction allowing a CLR UDT to be “unlimited” length. In practice the storage size is actually limited at the SQL large object (LOB) limit of 2^31 -1 bytes, or about 2 GB, in much the same fashion as the varbinary(max), varchar(max), and nvarchar(max) types that were introduced in SQL Server 2005. Theoretically, you can expose any .NET Framework CLR class or structure as a SQL CLR UDT as long as it meets several requirements involving constructors, annotations, and implementing a set of interfaces. Large UDTs in particular must implement a user-defined, serialization-to-bytes method that SQL Server relies on both when consuming parameters of that type from clients and when sending column result sets. A value of Format.UserDefined in the SqlUserDefinedTypeAttribute annotation indicates that the UDT has user-defined serialization and also introduces the requirement for the developer to implement the IBinarySerialize interface.

The analogy for other “max” types remains useful when describing how to take advantage of large versions of UDT in client applications. Integrating into applications currently using UDTs should be seamless, with only minor changes in metadata returned for columns and parameters of these types to allow discriminating applications to distinguish large versions from their less-than-8,000-byte counterparts. To see this difference, ADO.NET applications for result sets will invoke the SqlDataReader GetSchemaTable() method that returns a DataTable and examine the ColumnSize column where a value of “-1” indicates a large UDT. For parameters, they’ll examine or set the SqlParameter Size property where “-1” has the same meaning. In ODBC, you specify and distinguish large UDTs by the use of the SQL_SS_LENGTH_UNLIMITED macro originally introduced for “max” types. In OLE DB, you represent the large UDT column or parameter length as “~0”.

In addition to being able to fully utilize the functionality of the CLR UDT on both server and the application tier when working in ADO.NET, using user-defined serialization can allow native applications to access a UDT as a stream of bytes with a well-defined format. This, in turn, enables scenarios where applications interpret these bytes by parsing or by overlaying structure and provide similar business logic in middle tiers or client applications to what exists as CLR methods in the assembly. While these techniques can solve a range of scenarios involving structured data, a different limitation removal assists the growing number of document management-style applications.

In addition to being able to fully utilize the functionality of the CLR UDT on both server and the application tier when working in ADO.NET, using user-defined serialization can allow native applications to access a UDT as a stream of bytes with a well-defined format. This, in turn, enables scenarios where applications interpret these bytes by parsing or by overlaying structure and provide similar business logic in middle tiers or client applications to what exists as CLR methods in the assembly.

On the unstructured data side of the fence, SQL Server 2008 also enables applications to store unstructured data directly on the file system-outside of the database file-leveraging the rich streaming APIs and performance of the Windows file system. Though accessible through the file system, the data also remains compatible with the T-SQL programming model. Using this dual programming model-T-SQL and Win32-applications can maintain transactional consistency between unstructured data stored in the file system and structured data stored in the relational database.

You use this functionality-SQL Server Filestream-by adding a new storage attribute, FILESTREAM, on varbinary(max) columns, a type introduced in SQL Server 2005. The beauty of the feature is that other than removing the 2 GB LOB storage limitation, filestream columns work seamlessly in other operations, including data markup language (DML) operations like SELECT, INSERT, DELETE, and MERGE. In case you haven’t already heard, MERGE is yet another new SQL Server 2008 programmability feature that allows expressing multiple other DML operations in one statement, but that’s a detail for another day.

Getting back to Filestream, take a look at how this is exposed in different data access stacks. Because the only real change from the database application perspective is the difference in maximum size, existing programming patterns continue to work unchanged. Having said that, once you’ve made the transition to storing large data sizes, streaming data in your application both into and out of the server becomes more important for scalability. Even though coding patterns for streaming haven’t changed in SQL Server 2008, it’s worth doing a quick refresher. For ODBC, the application binds parameters using SQLBindParameter with ColumnSize set to SQL_SS_LENGTH_UNLIMITED and sets the contents of StrLen_or_IndPtr to SQL_DATA_AT_EXEC before it calls SQLExecDirect or SQLExecute. You must unbind and retrieve result columns via SQLGetData. For OLE DB, the application uses DBTYPE_IUNKNOWN in parameter and result bindings. For parameters, the consumer must implement an ISequentialStream; for results, the provider returns an ISequentialStream interface. For optimal streaming performance, applications can use a Win32 file access API to read or write data using a UNC path returned by new server built-in function PathName() available from filestream columns. The advantage of streaming using Win32 API over T-SQL grows as the data grows in size and the benefit can be seen as early as data 1 MB in size. SQL Server Native Client also includes a new method, OpenSqlFilestream, that combines and simplifies the operations of opening the file and associating it with an existing SQL Server transaction context using another new server built-in function, Get_filestream_transaction_context().

Building on New Foundations: System Types

Beyond the plumbing and infrastructure improvements discussed so far, SQL Server 2008 introduces two new CLR system types as out-of-the-box solutions for common application scenarios:

  • HierarchyId. Allows database applications to model tree structures like organizations or file systems in a more efficient way than currently possible. The type includes a rich set of functions for answering questions about relationships between nodes in the tree and reorganizing the tree.
  • Geometry. Implements the Open Geospatial Consortium (OGC) geometry type and encapsulates the entire OGC type hierarchy.

These CLR system types are available in every database and the CLR assemblies are available as a standalone redistributable package for applications to install with classes named SqlHierarchyId and SqlGeometry respectively. This allows .NET Framework applications in the middle tier or client to interact with the type like any other class. Non-.NET Framework applications will generally rely on server-side conversion or serialization to a documented well-known format. In the case of the Geometry type, applications will be able to retrieve the serialized bytes in a well-known format by using a new built-in function when issuing queries, choosing between STAsText() or STAsBinary(). These provide, respectively, the Open Geospatial Consortium (OGC) well-known text and well-known binary format. Alternatively, for all system types, developers can create user-defined functions on the server that operate on the data providing access to the full functionality of the type.

Putting the Pieces Together

In summary, SQL Server 2008 introduces a new programming construct, table-valued parameters, to more rationally model scenarios where applications are operating as tables of data. Table-valued parameters provide better performance and scalability than existing solutions, particularly when you can implement server-side logic with set-based operations. Along with this new programming construct, new intrinsic data types more efficiently handle date and time data and will help developers reduce custom code in applications by choosing a type that closely matches their needs. This comes with the benefit of aligning database storage costs to those same needs. For existing types, SQL Server 2008 removes two limitations: lifting the 8,000-byte size limit on CLR user-defined types and allowing you to store and access binary large object columns through the file system. Building on enhanced CLR type support, SQL Server 2008 also introduces two system types to provide out-of-the-box solutions for planar spatial and hierarchal data. This is far from an exhaustive list of new programming features, but it should start to get you thinking about how you can get the most out of SQL Server 2008.