In the first game of this doubleheader (the last issue of CODE Magazine), I covered 13 new database and T-SQL features in SQL Server 2012. Well, it’s the second game of the doubleheader, and the nightcap features 13 new features in SQL Server Integration Services 2012. SSIS has always been a good time, and now it’s an even better tool with enhancements and improvements over prior versions. Even if you had a love/hate relationship with SSIS before, you’ll find that Microsoft paid special attention to SSIS 2012.

Starting Lineup for Game 2

Normally in Baker’s Dozen tradition, I say, “What’s on the menu?” This time, I’m saying, “The starting lineup is as follows:”

  • New Development Editor - use of SQL Server Data Tools 2010
  • New Shared Connection managers to simplify the connection manager process across packages in a project
  • SSIS parameters at the project, package, and task level
  •     Baker’s Dozen Spotlight: a new variable expression task, to eliminate instances where scripts are necessary
    
  • New UNDO/REDO functionality in the data flow editor
  • New SSIS Expression Language Features
  • New native ODBC Data Flow Source and Destination Components
  • Greatly improved recovery from data lineage and invalid metadata reference issues
  • New Data Taps functionality to programmatically “tap” into a Data Flow pipeline
  • New SSIS tasks to support Change Data Capture
  • New Deployment features in SSIS 2012
  • A new SSIS Server Management Dashboard feature
  •     The Baker’s Dozen Potpourri - Miscellaneous new features in SSIS 2012
    

SQL Server 2012 Released to Manufacturing!

As of this writing (early March 2012), Microsoft has released SQL Server 2012 to manufacturing, and has set a release date of early April 2012 for general availability. With the new column store index, T-SQL features, SSIS features that I’ll talk about in this article, and new Business Intelligence features that I’ll talk about in subsequent articles, SQL Server 2012 is an industry game changer!

Tip 1: New SSIS Development Environment using SQL Server Data Tools

Prior to SSIS 2012, SSIS developers used Business Intelligence Development Studio, which was a shell of Visual Studio 2008. Some who used SSIS 2008R2 were (understandably) upset that even the R2 version (released in 2010) still used the VS2008 shell, as opposed to the updated WPF-based Visual Studio 2010 shell.

Fortunately, the “planets now align” - SSIS 2012 uses the WPF-based Visual Studio 2010 shell. The SSIS development editor is much more visually appealing. Although this might not be critical for experienced ETL developers, a better looking UI will help with the appeal for new ETL developers. Figure 1 and Figure 2 show the control flow and data flow for an SSIS package in the new SSDT environment. (At the end of this article, I’ll talk about what this package does.)

Figure 1: The control flow for an SSIS package in the new SSDT environment uses the WPF framework.
Figure 1: The control flow for an SSIS package in the new SSDT environment uses the WPF framework.
Figure 2: The data flow for an SSIS package in the new SSDT environment has the pipeline in blue instead of the old green color.
Figure 2: The data flow for an SSIS package in the new SSDT environment has the pipeline in blue instead of the old green color.

Tip 2: New Shared Connection Managers

In prior versions of SSIS, connection managers (for OLE DB, FTP, SMTP, flat file, and other connections) were scoped to individual packages, not projects. An SSIS project did not allow shared connections across packages inside the project. This meant that developers had to copy/paste connections and connection expressions from package to package.

(There was a workaround - a developer could create a package template with the necessary base connections, and then create new packages from the base package. However, all SSIS does is create a copy from the base package template, with no lineage back to the template. So any changes to the connection manager in the template will not “ripple through”.)

Figure 3:The new structure in SSIS 2012 lets you create project-level Connection Managers and use them throughout your project.
Figure 3:The new structure in SSIS 2012 lets you create project-level Connection Managers and use them throughout your project.

Figure 3 shows the new structure in SSIS 2012, where a developer can create project-level connection managers and then use them throughout the packages in the project. Any changes to the connection managers ripple through to packages that use them.

Tip 3: SSIS Parameters at the Project, Package, and Task Level

A common question in SSIS is how to pass parameters to packages. Prior to SSIS 2012, there was no direct way: you had to pass variables in parent-child configurations. This required some work and was moderately difficult to debug.

Microsoft has added full parameter capabilities for SSIS projects and packages. Figure 4 shows an example of the new tab in the SSIS package editor.

Figure 4: The new tab in the SSIS package editor defines package parameters.
Figure 4: The new tab in the SSIS package editor defines package parameters.

Tip 4: Baker’s Dozen Spotlight: A New Variable Expression Task

For years, I’ve used SSIS and taught SSIS. As much as I love the product, there were a few features that I felt strongly should behave differently. For instance, any time I need to programmatically set the value of an SSIS variable (so that I can take advantage of the variable later in the package), I need to “drop down to the basement” in SSIS and write a short script in either C# or Visual Basic to manipulate the value of the variable (using the weakly-typed Dts.Variables collection).

Certainly, developers shouldn’t be afraid to write .NET code when the need arises. (I still vaguely recall my past life as a C# developer/MVP.) However, one can argue that mundane tasks such as incrementing/accumulating variables or building variables from dynamic expressions should be an appropriately unceremonious process.

The good news - actually VERY good news - is that Microsoft recognized that a script task and extra steps were overkill for setting variable expressions, and created a new Variable Expressions task (Figure 5) in the Control Flow. This task allows you to use the SSIS expression language to manipulate SSIS variables at the package level, instead of the script level. While certainly not the largest enhancement in SSIS 2002, I personally find this a very welcome new task.

Figure 5: The new Variable Expressions task maintains SSIS variables without the need for an SSIS script task.
Figure 5: The new Variable Expressions task maintains SSIS variables without the need for an SSIS script task.

Tip 5: An UNDO Feature (Finally!)

It does seem a bit strange to praise a new feature that you would assume already exists in a product - but until SQL Server 2012, SSIS never had an undo/redo feature in the development environment. Unless you’re a perfect developer and a perfect typist (I’m certainly not, as my co-workers will attest), an undo feature is critical. It has existed in the development environment in SSRS for years and now exists in SSIS.

Tip 6: New SSIS Expression Language Features

The SSIS Expression Language (which is a cross between Visual Basic and C# syntax) has some new functions in SSIS 2012, as shown in Figures 6 and 7.

Figure 6: You can handle NULL values easily with the new REPLACENULL function.
Figure 6: You can handle NULL values easily with the new REPLACENULL function.
Figure 7: The new SSIS function identifies a specific token or returns the number of tokens.
Figure 7: The new SSIS function identifies a specific token or returns the number of tokens.
  • REPLACENULL function (as opposed to the prior method of using an ISNULL function with the ? and : immediate operators)
  • LEFT function for retrieving the N-most characters at the beginning of an expression
  • TOKEN (to parse a string and return a specific token) and TOKENCOUNT (to parse a string and return the number of tokens)

Tip 7: Support for ODBC Data Sources and Data Flow Destinations

Those who have tried to integrate SSIS with ODBC connections will be happy to learn that SSIS 2012 contains new native ODBC data flow components. This is related to Microsoft’s announcement that it will drop support for OLE DB after SQL Server 2012 in favor of ODBC.

SSIS 2012 allows you to create an ODBC connection manager, specifying either a user or system DSN, or a custom ODBC connection string.

Tip 8: Greatly Improved Recovery from Data Lineage and Invalid Metadata Reference Issues

Imagine that you give a young child a red lollipop - then a minute later you take away the lollipop. As the parent of a toddler, I know that even if you give the child a new red lollipop, the child will throw a fit.

Data Flow components prior to SSIS 2012 behaved similarly. For instance, suppose you have an OLE DB destination that expects ten columns from the data flow pipeline from the previous component. Now suppose that you remove one of the columns from the prior component (perhaps one of the ten columns isn’t used any longer). The OLE DB destination prior to SSIS 2012 complained and generated an error because of the invalid reference.

Alternatively, suppose the OLE DB destination expected 10 columns from a previous data flow component (component A), but now receives the same columns from a different component (component B). The OLE DB Destination component prior to SSIS 2012 still complained that the lineage of the columns was from a different parent component.

Correcting these issues always meant a certain amount of surgery on the component in error, to force it to recognize the changes. SSIS always materialized both the pipeline and the parent lineage into each subsequent component in the pipeline. Each component contained specific information about what columns it expected and where they came from - and didn’t respond well to changes.

This, like other issues in prior versions of SSIS, was something that new developers had trouble grasping, and experienced developers just simply lived with.

The good news is that Microsoft has addressed this problem in SSIS 2012, and corrections to invalid metadata in a component are now much easier.

Figures 8 and 9 demonstrate an example of this: a component (a flat file destination) expects a certain number of components, and then we remove one of the components from a previous data flow. The pipeline still generates an error, but we can use a better interface (Figure 9) to resolve any invalid pipeline references.

Figure 8: You can resolve invalid references in the pipeline using the new interface.
Figure 8: You can resolve invalid references in the pipeline using the new interface.
Figure 9: This is the new Resolve Invalid Data Flow Pipeline References Editor.
Figure 9: This is the new Resolve Invalid Data Flow Pipeline References Editor.

Tip 9: Data Taps

Choosing a Baker’s Dozen Spotlight feature for SSIS 2012 was a tough choice for me, as SSIS 2012 has so many great new features. The runner-up is SSIS Data Taps.

Suppose you have an SSIS package that runs overnight. You want a package to always write out the contents of a particular data flow to a flat file. In prior versions of SSIS, you could do this, but it meant introducing a new flat file destination into the package and possibly reworking the execution steps in the data flow. While possible, this wasn’t a great solution.

Fortunately, SSIS 2012 provides a more powerful and more elegant way to hook into (or “tap”) a specific data flow pipeline, and write the contents of the pipeline to a specific destination.

Step 1: In the data flow pipeline (Figures 10 and 11), identify the specific identification string for the specific pipeline.

Figure 10: To build a Data Flow Tap in SQL Server, you must first determine the identification string of the pipeline.
Figure 10: To build a Data Flow Tap in SQL Server, you must first determine the identification string of the pipeline.
Figure 11: You can easily find the identification string for the pipeline (for use in a Data Tap).
Figure 11: You can easily find the identification string for the pipeline (for use in a Data Tap).

Step 2: Deploy the package to the SSIS server database. (I’ll talk about this in the next tip.)

Step 3: In the SSIS server database, create an instance of an execution by using the SSIS system stored procedure called [catalog].[create_execution] (Listing 1). This returns an execution ID instance (basically an integer handle). Then use that instance ID to create a data tap by calling the SSIS system stored procedure [catalog].[add_data_tap] (also in Listing 1).

When the package executes on the server (through scheduling with SQL Server Agent or via any other execution method), the package always writes out a CSV file based on the output file referenced in the data tap!

Tip 10: New Tasks for Change Data Capture

SQL Server 2008 introduced Change Data Capture, a feature in the database engine to automate the capture of changes and the logging of inserts/updates/deletes to audit trail history tables. SSIS 2012 introduces new control flow and data flow tasks for managing CDC processes and for reading data from CDC tables.

The control flow task is the CDC Control task, which allows an ETL developer to control the lifecycle of CDC processes.

There are two CDC data flow components. The first is the CDC source, which allows you to open a CDC change tracking log table and read the rows into the pipeline. The second is the CDC splitter, which separates the rows in a change tracking table into three distinct pipelines: new rows from the change tracking table, updated rows (with the before and after values from the updates), and deleted rows.

Tip 11: A New Deployment Feature

Package deployment in prior versions of SSIS, while certainly functional, has always been a bit of pomp and circumstance. Fortunately, SSIS 2012 offers a brand new method of deploying packages to an SSIS Database catalog.

First, you need to configure the instance of SQL Server. In Figure 12, you create a new SSIS Database Catalog before deploying SSIS Projects.

Figure 12: You must create a new SSIS Database Catalog before deploying SSIS Projects.
Figure 12: You must create a new SSIS Database Catalog before deploying SSIS Projects.

Next, you need to enable CLR integration as part of creating the SSIS database catalog (Figure 13).

Figure 13: To Create SSIS Database Catalog, you must enable CLR integration and provide a password.
Figure 13: To Create SSIS Database Catalog, you must enable CLR integration and provide a password.

Finally, back in SSDT (Visual Studio), you can deploy the project and all of the packages (Figure 14).

Figure 14: The new SSIS project deploy screen deploys SSIS projects to the SSISDB Catalog.
Figure 14: The new SSIS project deploy screen deploys SSIS projects to the SSISDB Catalog.

Tip 12: New Management Dashboard Feature


Once packages have been deployed to the new SSIS database catalog, you can access the catalog database and any package options (Figure 15**),** redefine parameters (Figure 16), to redefine connection managers (Figure 17), and generate reports on package execution (Figure 18).

Figure 17: The SSIS Catalog database package options are used to redefine connection managers.
Figure 17: The SSIS Catalog database package options are used to redefine connection managers.
Figure 18: This report shows activity on the package execution.
Figure 18: This report shows activity on the package execution.
Figure 15: The SSIS Catalog database lists useful options after package deployment.
Figure 15: The SSIS Catalog database lists useful options after package deployment.
Figure 16: The SSIS Catalog database package options help you redefine parameters.
Figure 16: The SSIS Catalog database package options help you redefine parameters.

Additionally, you can read about options to author reports against the SSIS Catalog database: http://blogs.msdn.com/b/mattm/archive/2011/08/01/report-authoring-on-the-ssis-catalog.aspx.

Tip 13: The Baker’s Dozen Potpourri - Miscellaneous New Features in SSIS 2012

In addition to all the major features above, SSIS 2012 has plenty of additional features that further bolster the case for SSIS 2012 being a major and important new version. Here are some of the other features new in SSIS 2012:

  • In SSIS 2005, there were many areas in the User Interface where you had to type out a variable (as opposed to selecting from a list). SSIS 2008 took care of most of those areas, but left a small number unaddressed. SSIS 2012 has finally covered all the areas where a variable needs to be referenced.
  • SSIS 2012 makes it easier to create a data viewer (fewer keystrokes).
  • You can now populate a data flow row count as fast as a Nolan Ryan fastball. (Feel free to search for Nolan Ryan!)
  • Expression Result Length > 4000.
  • SSIS 2012 allows developers to set breakpoints as part of Script component debugging. Additionally, Microsoft upgraded the scripting engine to VSTA 3.0. The SSIS Team Blog talks more about this here: http://blogs.msdn.com/b/mattm/archive/2012/01/13/script-component-debugging-in-ssis-2012.aspx.
  • The Merge and Merge Join Transformations now use less memory than before. As a result, developers no longer need to set the MaxBuffersPerInput property (which was necessary to avoid consuming excess memory). You can now change the scope of a variable.

Final Note: Is SSIS as Great as Advertised? Yes!

As an instructor, one of the things I cover with students is the value of SQL Server Integration Services (SSIS). It’s always important for people to understand the benefit of the tool - and an example-driven approach can go a long way toward seeing how SSIS can help with many data processing requirements.

Figures 1 and 2 (along with Listings 2 and 3) show the control flow and data flow for a stripped-down version of a production SSIS package. The package does the following:

  • Truncates a (staging) table that the package uses to temporarily hold new incoming data
  • Retrieves a variable number of CSV (text) files from an FTP server (in this demo, currency exchange rate data, such as daily exchange rates from US to Mexico, US to Japan, etc.)
  • Dynamically loops through the CSV files (where you don’t know the names of the files at design-time), opens the contents, performs some validations (such as checking that the currency codes are valid), and then inserting the data into the staging table
  • If the number of files processed was greater than zero, calls a T-SQL stored procedure that utilizes a MERGE statement. The MERGE statement reads both the staging table and the production exchange rate table, and performs two actions: inserts any rows that exist in the staging table but not the production table, and updates any rows where the rates have actually changed.

Just like other development paradigms have design patterns, this SSIS package represents a common SSIS design pattern: loading multiple sets of files into a temporary table, then using a T-SQL MERGE statement to insert/update the data. The alternate approach of inserting/updating individual rows to the production table will likely perform worse, compared to a single MERGE statement against a large set of data.

Next Time Around in the Baker’s Dozen

There are plenty of new features in the Analysis Services portion of SQL Server 2012, and I plan to cover them later in the year. However, the next installment of the Baker’s Dozen will cover 13 different tips for optimizing T-SQL queries in SQL Server.

Listing 1: SQL code in the SSIS service engine for Data taps

USE [SSISDB]

DECLARE @return_value int, @execution_id bigint, 
DECLARE @data_tap_id bigint 

EXEC  [catalog].[create_execution]
      @folder_name = N'SSIS2012DemoProjectFolder',
      @project_name = N'SSIS2012DemoProject',
      @package_name = N'ETLMergeExample.dtsx.dtsx',
      @execution_id = @execution_id OUTPUT

EXEC [catalog].[add_data_tap]
     @execution_id = @execution_id,
       @task_package_path =
   N'\Package\Foreach Loop Container\Data Flow’ + 
     ‘ - Process CSV filesProduct',
       @dataflow_path_id_string = 
         N'Paths[Data Conversion.Data Conversion Output]',
       @data_filename = N'OutputDatatap.csv',
       @data_tap_id = @data_tap_id OUTPUT

Listing 2: Script to create staging table

use AdventureWorks2008R2
go

if exists( select * from sys.objects where object_id = 
     object_id('dbo.TempStagingCurrencyRates'))
         DROP TABLE [dbo].[TempStagingCurrencyRates] 
GO

CREATE TABLE [dbo].[TempStagingCurrencyRates]  (
   [CurrencyRateDate] [datetime] NOT NULL,
   [FromCurrencyCode] [nchar](3) NOT NULL,
   [ToCurrencyCode] [nchar](3) NOT NULL,
   [AverageRate] [money] NOT NULL,
   [EndOfDayRate] [money] NOT NULL
)  
GO 

Listing 3: T-SQL Script to create a MERGE

USE [AdventureWorks2008R2]
GO

CREATE PROCEDURE [dbo].[MergeTempCurrencyRates]
as
begin
  DECLARE @MergeActions TABLE   (ActionName varchar(10) )
  MERGE Sales.CurrencyRate as T
     using [dbo].[TempStagingCurrencyRates] as S 
            on T.CurrencyRateDate = S.CurrencyRateDate and
               T.FromCurrencyCode = S.FromCurrencyCode and
               T.ToCurrencyCode = S.ToCurrencyCode

     when not matched 
        then insert
         (CurrencyRateDate, FromCurrencyCode, 
           ToCurrencyCode,AverageRate, EndOfDayRate)
          Values
          (S.CurrencyRateDate, S.FromCurrencyCode,  
          S.ToCurrencyCode, S.AverageRate, S.EndOfDayRate)

     when matched and (S.AverageRate <> T.AverageRate OR 
                       S.EndOfDayRate <> T.EndOfDayRate )
         then update set T.AverageRate = S.AverageRate, 
                         T.EndOfDayRate = S.EndOfDayRate
   OUTPUT $action  as ActionName into @MergeActions ;
   
-- bring back the # of insertions and the # of updates,
-- so that the SSIS can read them into 2 variables   

   DECLARE @NumInserts INT, @NumUpdates INT
   SET @NumInserts = (select COUNT(*) FROM @MergeActions
                   WHERE ActionName = 'INSERT')

   SET @NumUpdates = (SELECT COUNT(*) FROM @MergeActions 
                   WHERE ActionName = 'UPDATE')
   
   SELECT @NumInserts as NumInserts,
          @NumUpdates AS NumUpdates
END

GO