The first part of this book introduces the Office object models and the Office primary interop assemblies (PIAs). You also learn how to use Visual Studio to build automation executables, add-ins, and code behind the document by using features of Visual Studio 2008 Tools for Office 2007 (VSTO 3.0).

  • Chapter 1, “An Introduction to Office Programming,” introduces the Office object models and examines their basic structure. The chapter describes how to work with objects, collections, and enumerations-the basic types used in all Office object models. You learn how to use properties, methods, and events exposed by objects and collections in the Office object models. Chapter 1 also introduces the PIAs, which expose the Office object models to .NET code, and describes how to use and reference Office PIAs in a VSTO project.
  • Chapter 2, “Introduction to Office Solutions,” covers the main ways Office applications are customized and extended. The chapter describes the various kinds of Office solutions you can create using VSTO.

The Other Parts of This Book

Part II: Office Programming in .NET

Part II of this book covers the Office object models in more depth. Chapters 3 through 5 cover Excel, Chapters 6 through 8 cover Word, and Chapters 9 through 11 cover Outlook. These chapters also include some discussion of application-specific features and issues. Chapter 3, for example, talks about how to build custom formulas in .NET for Excel, and Chapter 5 discusses the Excel-specific locale issue in some detail. You can select which chapters of Part II to read; if you are interested only in Excel development, you can read Chapters 3 through 5 and then skip to Part III of this book.

Part III: Office Programming in VSTO

Part III of this book, comprised of Chapters 12 through 21, describes the features that Visual Studio 2008 Tools for Office 2007 brings to Office development. This part describes all the features of VSTO, including using Windows Forms and Windows Presentation Foundation controls in Excel and Word documents, using data binding against Office objects, building Smart Tags, and adding Windows Forms and Windows Presentation Foundation controls to Office’s task pane.

An Introduction to Office Programming

Why Office Programming?

The family of Office 2007 applications covered by this book (Excel 2007, Word 2007, and Outlook 2007) represents an attractive platform on which to build solutions. You can customize and extend Office applications by developing solutions against their object models. By building a solution using the Office system, you can reuse some of the most feature-rich and popular applications available. A solution that analyzes or displays data can take advantage of the formatting, charting, calculation, and analysis features of Excel. A solution that creates documents can use the capability of Word to generate, format, and print documents. A solution that manipulates business information can present it in an Outlook folder or in an InfoPath form. It is far better to reuse the applications that you already know than to build these features from scratch.

Office Business Applications

Information workers use the Office environment on a daily basis. A solution built using Office can become a seamless part of that environment. Too frequently, users must go to a Web page or some other corporate application to get data that they want to cut and paste into an Excel workbook or a Word document anyway. Many users want to use Outlook as their business information portal. By integrating a solution with Office, you enable users to get the information they need without having to switch to another application.

Custom applications that bring business data into Office and SharePoint are called Office Business Applications or OBAs. Microsoft continues to make it easier to integrate line of business data into the Office system through advances in the Office platform. The most recent example of this is the new Business Data Catalog (BDC) in SharePoint that can be extended to bring custom business data into SharePoint. Within Office client applications like Excel and Word, you can use data binding to bring line of business data into Excel and Word documents.

Offfice Programming and the Professional Developer

Historically, most Office programming has been done via Visual Basic for Applications (VBA) and the macro recording features built in to some Office applications. Users would record a macro to automate a repetitive task within an Office application. Sometimes the code created by recording a macro would be further modified using VBA and turned into more complicated departmental solutions-often by users who were not trained as programmers and whose primary job was not programming. These solutions would sometimes make their way up the corporate food chain and get taken over by professional developers and turned into business solutions.

Professional developers had a different set of needs from VBA developers-they wanted the ability to scale an Office solution to an entire enterprise. They wanted better support for updating Office solutions once they were deployed. They wanted to use languages like C# and even have solutions that mixed several languages. They also wanted features that supported team development, like source code control.

Why .NET for Office?

The .NET Framework and its associated class libraries, technologies, and languages address many of the concerns that professional developers had with Office development. Today’s Office development can be done using Visual Studio 2008, which is a rich programming environment for professional developers. Developers can use .NET languages such as Visual Basic or C#. The Office PIAs (Primary Interop Assemblies) allow .NET code to call the unmanaged object models that Office applications expose. The rich .NET class libraries enable developers to build Office solutions using technologies such as the Windows Presentation Foundation (WPF) to show user interface (UI) and the Windows Communication Foundation (WCF) to connect to corporate data servers.

Why Visual Studio Tools for Office?

Visual Studio 2008 Tools for Office 2007 (VSTO 3.0) provides .NET programming support for Word, Excel, Outlook, PowerPoint, Project, Visio, and InfoPath in Visual Studio. VSTO turns the Word or Excel document being programmed against into a .NET class, replete with data binding support, controls that can be coded against much like a Windows Forms control, and other .NET features. It makes it easy to integrate .NET code into Outlook. It enables developers to put .NET code behind InfoPath forms. Developers can program against key Office objects without having to traverse the entire Office object model.

Table 1-1 shows the Office applications that are supported by VSTO 3.0 and the various VSTO features that are enabled for each Office application. In general, VSTO has the most features and support for Word, Excel, and Outlook. This book only covers these three applications, although much of what you learn in this book about add-ins, custom task panes, the visual Ribbon designer, and ClickOnce can be applied to PowerPoint, Visio, InfoPath, and Project development.

VSTO 3.0 has now been integrated directly into Visual Studio Professional and higher level SKUs where in the past it was only available as a standalone product or with Visual Studio Team System. Office programming is now available to more developers than ever before. VSTO 3.0 also supports Office 2003-in particular it supports the same set of Document Level projects and Application Level Add-in Projects shown in Table 1-1 (except for Application Level Add-ins for InfoPath 2003). For more on Office 2003 support in VSTO, consult the first edition of this book.

How .NET Is It?

This book discusses many new .NET ways of programming against Office applications. However, some aspects of Office programming remain awkward using .NET. Most of these awkward areas are attributable to the fact that the Office object models were designed to work with a technology called COM. Although .NET code can talk to the Office object models via PIAs, the object models sometimes do not feel very .NET-friendly. Furthermore, the Office object models do not always follow the naming conventions or design patterns of classes that were designed for .NET.

In the future, many of the Office object models will likely be redesigned for .NET, and the object models will feel friendlier to a .NET developer. For now, developers must live in a transitional period in which some aspects of Office programming feel like they were designed for .NET and other aspects do not. This book discusses some of the most difficult problems developers encounter when using .NET with Office and how to work around these problems.

One recent advance that makes programming against Office object models more palatable to the .NET developer are the Office Interop API extensions (part of the VSTO Power Tools available here: http://www .microsoft.com/downloads/details.aspx?FamilyId=46B6BF86-E35D-4870-B214-4D7B72B02BF9&displaylang=en). These are a set of classes that help the C# developer write more elegant code when using parameterized properties and optional or named parameters. Because these extensions aren’t in wide use yet, this book will show coding directly against the PIAs. But the Office Interop API extensions are an attractive option for the developer looking for more .NET-friendly APIs.

Office Object Models

Almost all Office programming involves writing code that uses the object model of an Office application. The object model is the set of objects provided by the Office application that running code can use to control the Office application. The object model of each Office application is organized hierarchically with the object called Application forming the root of the hierarchy. From the Application object, other objects that make up the object model of the Office application can be accessed.

As an example of how object model objects are related in the object model hierarchy, Figure 1-1 shows some of the most important objects in the Word object model. The root object is the Application object. Also shown in this diagram are some other objects, including Documents, Document, Paragraphs, and Paragraph. The Application object and Documents object are related because the Documents object is returned via a property on the Application object. Other objects are not directly accessible from the root Application object, but are accessible by traversing a path. For example, the Paragraphs object is accessed by traversing the path from Application to Documents to Document to Paragraphs. Figure 1-2 shows a similar diagram for some major objects in the Excel object model hierarchy.

Objects

Each Office application’s object model consists of many objects that you can use to control the Office application. Word has 294 distinct objects, Excel has 243, and Outlook has 139. Objects tend to correspond to features and concepts in the application itself. For example, Word has objects such as Document, Bookmark, and Paragraph-all of which correspond to features of Word. Excel has objects such as Workbook, Worksheet, Chart, and Series-all of which correspond to features of Excel. As you might suppose, the most important and most used objects in the object models are the ones that correspond to the application itself, the document, and key elements in a document such as a range of text in Word. Most solutions use these key objects and only a small number of other objects in the object models. Table 1-2 lists some of the key objects in Word, Excel, and Outlook along with brief descriptions of what these objects do.

Where objects in an Office object model start to differ from typical .NET classes is that the vast majority of object model objects are not creatable or “new-able.” In most Office object models, the number of objects that can be created by using the new keyword is on the order of one to five objects. In most Office solutions, new will never be used to create an Office object-instead, an already created Office object, typically the root Application object, is passed to the solution.

New in Office 2007

Many new objects have been added in Office 2007. Word 2007 now has 54 more objects than it did in Word 2003, for example. Many of those new objects allow programmatic access to the new equation-editing features in Word 2007. Also added in Word are object model objects for the new Building Blocks and content controls features. Excel 2007 has 51 new objects. Some of the objects that were added support the new conditional formatting rules in Excel 2007, as well as work with page layout and pivot tables.

Because most Office object model objects cannot be created directly, they are instead accessed via the object model hierarchy. For example, Listing 1-1 shows how to get a Worksheet object in Excel starting from the Application object. This code is a bit of a long-winded way to navigate the hierarchy because it declares a variable to store each object as it traverses the hierarchy. The code assumes that the root Excel Application object has been passed to the code and assigned to a variable named app. It also uses C#’s as operator to cast the object returned from the Worksheets collection as a Worksheet, which is necessary because the Worksheet collection is a collection of object for reasons described in Chapter 3, “Programming Excel.”

If the code does not need to cache each object model object in a variable as it goes but only needs to get a Worksheet object, a more efficient way to write this code is as follows:

Excel.Worksheet myWorksheet2 = app.Workbooks.get_Item(1).
Worksheets.get_Item(1) as Excel.Worksheet;

Collections

Paragraphs and Documents are examples of a type of object called a collection. A collection is a specialized object that represents a group of objects. Typically, a collection is named so that its name is the plural of the type of the object it contains. For example, the Documents collection object is a collection of Document objects. Some collection objects may be collections of a value type such as a string.

Collections typically have a standard set of properties and methods. A collection has a Count property, which returns the number of objects in the collection. A collection also has an Item method, which takes a parameter, typically a number, to specify the index of the desired object in the collection. A collection may have other properties and methods in addition to these standard properties and methods.

Listing 1-2 shows iteration over a collection using the Count property of the collection and the Item method of the collection. Although this is not the preferred way of iterating over a collection (you typically use foreach instead), it does illustrate two key points. First, collections in Office object models are almost always 1-based, meaning they start at index 1 rather than index 0. Second, the parameter passed to the get_Item method is often passed as an object so you can either specify a numeric index as an int or the name of the object within the collection as a string.

Why get_Item?

If you are new to Office development in C#, you may be surprised to see Listing 1-2 use a strange method called get_Item rather than just a property called Item. This is a result of C#’s lack of support for properties that take parameters (like the Item property on Excel’s Workbooks collection)-C# must convert the parameterized property to a method which it prefixes with get_ or set_. For more information on parameterized properties, see the section “Parameterized Properties” later in this chapter.

If you were to look at the definition for the Workbooks collection’s get_Item method, you would see that it takes an object parameter. Even though the get_Item method takes an object parameter, we pass an int value and a string value to it in Listing 1-2. This works because C# can automatically convert a value type such as an int or a string to an object when you pass the value type to a method that takes an object. This automatic conversion is called boxing. C# automatically creates an object instance known as a box to put the value type into when passing it to the method.

The preferred way of iterating over a collection is using the foreach syntax of C#, as shown in Listing 1-3.

Sometimes you may want to iterate over a collection and delete objects from the collection by calling a Delete method on each object as you go. This is a risky practice because behavior of a collection in the Office object models is sometimes undefined if you are deleting items from it as you iterate over it. Instead, as you iterate over the Office object model collection, add the objects you want to delete to a .NET collection you have created, such as a list or an array. After you have iterated over the Office object model collection and added all the objects you want to delete to your collection, iterate over your collection and call the Delete method on each object.

Enumerations

An enumeration is a type defined in an object model that represents a fixed set of possible values. The Word object model contains 292 enumerations, Excel 230, and Outlook 145.

As an example of an enumeration, Word’s object model contains an enumeration called WdWindowState. WdWindowState is an enumeration that has three possible values: wdWindowStateNormal, wdWindowStateMaximize, and wdWindowStateMinimize. These are constants you can use directly in your code when testing for a value. Each value corresponds to an integer value. (For example, wdWindowStateNormal is equivalent to 0.) However, it is considered bad programming style to make comparisons to the integer values rather than the constant names themselves because it makes the code less readable.

Objects in an Office application’s object model are written in unmanaged code (typically C++ classes which are exposed as COM objects). Your .NET code can access the unmanaged object models of Office through .NET classes defined in a special .NET assembly called a Primary Interop Assembly or PIA. An object in the object model is required to have at least one property, method, or event. Most of the objects in an Office application’s object model have several properties, a few methods, and no events. In Office object models, properties predominate, followed by methods, and trailed distantly by events. The most important objects in the object model, such as Application and Document, are typically much more complex and have a much larger number of properties and methods as well as events. For example, Word’s Application object has about 114 properties, 65 methods, and 28 events. Table 1-3 lists some of the properties, methods, and events on the Word Application object to give a sense of the types of functionality an object model object provides.

Properties

Properties are simple methods which allow you to read or write particular named values associated with an object. For example, Word’s Application object has a property called CapsLock, which returns a bool value. If the Caps Lock key is down, it will return true; if the Caps Lock key is up, it will return false. Listing 1-4 shows some code that examines this property. The code assumes that the root Application object of the Word object model has already been assigned to a variable called app.

Another thing to note about the CapsLock property is that it is a read-only property. That is, you cannot write code that sets the CapsLock property to false; you can only get the value of the CapsLock property. If you try to set a read-only property to some value, an error will occur when you compile your code.

The CapsLock property returns a bool value. It is also possible for a property to return an enumeration. Listing 1-5 shows some code that uses the WindowState property to determine whether Word’s window is maximized, minimized, or normal. This code uses C#’s switch statement to evaluate the WindowState property and compare its value to the three possible enumerated value constants. Notice that when you specify enumerated values in C#, you must specify both the enumerated type name and the enumerated value-for example, if you just used wdWindowStateNormal rather than WdWindowState.wdWindowStateNormal the code will not compile.

Properties can also return other object model objects. For example, Word’s Application object has a property called ActiveDocument that returns the currently active document-the one the user is currently editing. The ActiveDocument property returns another object in the Word object model called Document. Document in turn also has properties, methods, and events. Listing 1-6 shows code that examines the ActiveDocument property and then displays the Name property of the Document object.

What happens if there is no active document-for example, if Word is running but no documents are opened? In the case of the ActiveDocument property, it throws an exception. So a safer version of the preceding code would catch the exception and report no active document was found. Listing 1-7 shows this safer version. An even better approach is to check the Count property of the Application object’s Documents collection to see whether any documents are open before accessing the ActiveDocument property.

Object models sometimes behave differently in an error case in which the object you are asking for is not available or does not make sense in a particular context. The property can return a null value. The way to determine whether an object model property will throw an exception or return a null value is by consulting the object model documentation for the property in question. Excel’s Application object uses this pattern for its ActiveWorkbook property. If no Excel workbook is open, it returns null instead of throwing an exception. Listing 1-8 shows how to write code that handles this pattern of behavior.

Parameterized Properties

The properties examined so far in this section are parameterless. However, some properties require parameters. For example, Word’s Application object has a property called FileDialog that returns a FileDialog object. The FileDialog object is an object that is shared across all the Office applications. The FileDialog property takes an enumeration parameter of type MsoFileDialogType, which is used to pick which FileDialog is returned. Its possible values are msoFileDialogOpen, msoFileDialogSaveAs, msoFileDialogFilePicker, and msoFileDialogFolderPicker.

C# does not support calling parameterized properties as properties. When you go to use the Word object model from C# and look for the FileDialog property on Word’s Application object, it is nowhere to be seen. The FileDialog property is callable from C#, but only via a method-the method is named get_FileDialog. So when you are looking for a parameterized property in C#, be sure to look for the get_Property method (where Property is the name of the property you want to access). To set parameterized properties in C# (assuming they are not read-only properties), there is a separate method called set_Property (where Property is the name of the property you are going to set).

An exception to this is found when using VSTO document-level projects for Word and Excel. A handful of Word and Excel object model objects are extended by VSTO. These objects have been extended to give you a different way of accessing a parameterized property-via an indexer. An indexer enables you to access the property in the same way you would access an array-with the name of the property followed by a parameter list between the delimiters [ and ]. So for an object model object extended by VSTO, such as Worksheet, a parameterized property such as Range, which takes two parameters, can be called using the indexer syntax: Range[parameter1, parameter2] instead of get_Range(parameter1, parameter2).

The code in Listing 1-9 uses the FileDialog property called as a method and passes msoFileDialogFilePicker as a parameter to specify the type of FileDialog object to be returned. It then calls a method on the returned FileDialog object to show the dialog box.

The Office object models also have properties that have optional parameters. Optional parameters are parameters that can be omitted and the Office application will fill in a default value for the parameter. Optional parameters are typically of type object because of how optional parameters are passed to the underlying COM API. In C#, you must pass a special value to optional parameters that are of type object if you do not want to specify the parameter. This special value is called System.Type.Missing, and it must be passed for optional parameters that you do not want to specify directly (unlike Visual Basic in which you can omit the parameter entirely). In VSTO projects, a “missing” variable is predeclared for you (that is set to System.Type.Missing). Therefore, in VSTO code, you will often see missing passed rather than System.Type.Missing.

Occasionally, you will find an optional parameter is of some enumeration type rather than of type object. For this kind of optional parameter, you cannot pass System.Type.Missing and must instead pass a specific enumerated type value. You can find out what the default enumerated type value is for the optional parameter by consulting the documentation for the method or by using the object browser in a Visual Basic project-unfortunately, the C# object browser does not show the default value for an optional enumerated type parameter.

Listing 1-10 shows an example of calling a parameterized property called Range, which is found on Excel’s Application object. The Range property is accessed via the get_Range method because parameterized properties can only be called via a method in C#. Calling the get_Range method on Excel’s Application object returns the Range object in the active workbook as specified by the parameters passed to the method. The get_Range method takes two parameters. The first parameter is required, and the second parameter is optional. If you want to specify a single cell, you just pass the first parameter. If you want to specify multiple cells, you have to specify the top-left cell in the first parameter and the bottom-right cell in the second parameter.

In Word, optional parameters are handled differently than in the other Office applications. Word’s object model requires that optional parameters be passed by reference. This means that you cannot pass System.Type.Missing directly as the code in Listing 1-10 did. Instead, you must declare a variable, set it to System.Type.Missing, and pass that variable by reference. You can reuse the same declared variable that has been set to System.Type.Missing if a parameter has multiple parameters you want to omit. In a VSTO project, you can just pass by reference the missing variable that is predeclared for you. Listing 1-11 shows how to specify optional parameters in Word. In this example, the code uses a parameterized property from Word’s Application object called SynonymInfo, which has a required string parameter to specify a word you want a synonym for and an optional parameter to specify the language ID you want to use. The SynonymInfo property is accessed via the get_SynonymInfo method because parameterized properties can only be called via a method in C#. By omitting the optional language ID parameter and passing by reference a variable set to System.Type.Missing, Word will default to use the current language you have installed.

Properties Common to Most Objects

Because all the object model objects have object as their base class, you will always find the methods GetType, GetHashCode, Equals, and ToString on every object model object. In addition to these methods that are required by .NET, the Office object model follows a pattern that results in the frequent appearance of Application, Creator, and Parent properties. The Application property returns the Application object associated with the Office object model object. This is provided as a quick way to get back to the root of the object model. Many Office object model objects have a property called Creator, which gives you a code indicating which application the object was created in. Finally, you will often find a Parent property that returns the object that is the parent in the object model hierarchy.

Methods

A method is typically more complex than a property and represents a “verb” on the object that causes something to happen. It may or may not have a return value and is more likely to have parameters than a property.

The simplest form of a method has no return type and no parameters. Listing 1-12 shows the use of the Activate method from Word’s Application object. This method activates the Word application, making its window the active window (the equivalent of clicking the Word window in the taskbar to activate it).

Methods may also have parameters and no return type. Listing 1-13 shows an example of this kind of a method. The ChangeFileOpenDirectory method takes a string that is the name of the directory you want Word to default to when the Open dialog box is shown. For a method this simple, you might wonder why a property was not used instead-for example, you can imagine Word having a FileOpenDirectory property. In this case, the ChangeFileOpenDirectory only changes the default open directory temporarily-for the lifetime of the current Word session. When you exit Word and then restart Word, the default will no longer be what you set with this method. One reason that object models sometimes use a simple method such as this rather than a property is because some values exposed in an object model are “write-only”; that is, they can only be set but cannot be read. It is common to create a read-only property, but not common to create a write-only property. So when a write-only property is needed, a simple method is often used instead.

Methods can have no parameters and a return type. Listing 1-14 shows an example of this kind of a method. The DefaultWebOptions method returns the DefaultWebOptions object, which is then used to set options for Word’s Web features. In this case, DefaultWebOptions really should have been implemented as a read-only property as opposed to a method.

Methods can have parameters and a return type. Listing 1-15 shows an example of this kind of a method. The CentimetersToPoints method takes a centimeter value and converts it to points, which it returns as the return value of the method. Points is a unit often used by Word when specifying spacing in the document.

Methods can also have optional parameters. Optional parameters do not need to be specified directly to call the method. For any parameters you do not want to specify, you pass a special value defined by .NET called System.Type.Missing. Listing 1-16 shows a method called CheckSpelling in Excel that has optional parameters. Listing 1-16 illustrates the syntax you use to omit parameters you do not want to specify. The CheckSpelling method takes a required string-the word you want to check the spelling of-along with two optional parameters. The first optional parameter enables you to pick a custom dictionary to check the spelling against. The second optional parameter enables you to tell the spell checker to ignore words in all uppercase-such as an acronym. In Listing 1-16, we check a phrase without specifying any of the optional parameters-we pass System.Type.Missing to each optional parameter. We also check a second phrase that has an acronym in all uppercase so we pass System.Type .Missing to the first optional parameter because we do not want to use a custom dictionary, but we specify the second optional parameter to be true so the spell checker will ignore the words in all uppercase.

Optional Parameters in Word

Optional parameters in Word can produce some strange-looking C# code because the values passed to optional parameters must be passed by reference. For example, Listing 1-17 shows how to spell check a string using the Word object model in C#.

The first thing that probably comes to mind if you are a Visual Basic programmer and you have never seen code written against Word in C# is “Why is this so verbose?” Visual Basic does some special things for you when there are optional arguments in a method, so the Visual Basic version of this is much simpler, as shown in Listing 1-18.

In Visual Basic, you do not have to worry about passing a value for each optional argument-the language handles this for you. You can even use commas, as shown in Listing 1-18, to omit one particular variable you do not want to specify. In this case, we did not want to specify a custom dictionary, but we did want to pass a value for the parameter IgnoreUpperCase, so we omitted the custom dictionary argument by just leaving it out between the commas. It is also possible to use named parameters to specify only the parameters you want to supply. Note that some of this functionality provided natively to Visual Basic is made available when C# programmers use the Office Interop API Extensions.

The first thing that probably comes to mind if you’re a C# programmer and you have never seen code written against Word in C#, such as the code shown in Listing 1-17, is “Why is all that stuff passed by reference?” When you are talking to Office object model methods, properties, and events, you are talking to the object model through a .NET technology called COM interop (short for interoperate). The Office object models are all written in unmanaged code (C and C++) that is exposed via COM interfaces. You will read more detail later in this chapter about the technology called interop assemblies that allows managed code to call COM objects.

If you were to examine the COM definition for the CheckSpelling method used in Listing 1-17 as defined by Word’s COM type library, you would see something like this:

HRESULT CheckSpelling(
    [in] BSTR Word,
    [in, optional] VARIANT* CustomDictionary,
    [in, optional] VARIANT* IgnoreUppercase,
    [in, optional] VARIANT* MainDictionary,
    [in, optional] VARIANT* CustomDictionary2,
    [in, optional] VARIANT* CustomDictionary3,
    [in, optional] VARIANT* CustomDictionary4,
    [in, optional] VARIANT* CustomDictionary5,
    [in, optional] VARIANT* CustomDictionary6,
    [in, optional] VARIANT* CustomDictionary7,
    [in, optional] VARIANT* CustomDictionary8,
    [in, optional] VARIANT* CustomDictionary9,
    [in, optional] VARIANT* CustomDictionary10,
    [out, retval] VARIANT_BOOL* prop);

Note that any parameter that is marked as optional is specified as a pointer to a VARIANT in Word (VARIANT*). A VARIANT is a type in COM that is roughly equivalent to object in .NET-it can contain many different types of values. Excel does not typically use a pointer to a VARIANT for optional parameters, so you do not have this by ref issue for most of Excel. When the PIA is generated, the C# IntelliSense looks like this:

bool _Application.CheckSpelling(string Word,
  ref object CustomDictionary,
  ref object IgnoreUppercase,
  ref object MainDictionary,
  ref object CustomDictionary2,
  ref object CustomDictionary3,
  ref object CustomDictionary4,
  ref object CustomDictionary5,
  ref object CustomDictionary6,
  ref object CustomDictionary7,
  ref object CustomDictionary8,
  ref object CustomDictionary9,
  ref object CustomDictionary10)

Because of how Word defined optional parameters in its COM objects (as pointer to a VARIANT) and because of how that translates into .NET code (an object passed by reference), any optional argument in Word has to be passed by reference from C# and has to be declared as an object. Even though you would like to strongly type the IgnoreUppercase parameter to be a bool in the CheckSpelling example, you cannot. You have to type it as an object or you will get a compile error. This ends up being a little confusing because you can strongly type the first argument-the string you want to check. That’s because in the CheckSpelling method, the Word argument (the string you are spell checking) is not an optional argument to CheckSpelling. Therefore, it is strongly typed and not passed by reference. Also note that optional arguments are always listed after all required arguments-that is, you will never find a situation where argument1 is optional and argument2 is not.

This brings us back to System.Type.Missing. In C# to omit an optional argument you pass an object by reference set to System.Type.Missing. In our example, we just declared one variable called missing and passed it in 11 times.

When you pass objects by reference to most managed functions, you do so because the managed function is telling you that it might change the value of that object you passed into the function. So it might seem bad to you that we are declaring one variable and passing it to all the parameters of CheckSpelling that we do not care about. After all, imagine you have a function that takes two parameters by reference. If you pass in the same variable set to System.Type.Missing to both parameters, what if the code evaluating the first parameter changes it from System.Type.Missing to some other value such as the bool value true? This would also affect both the first parameter and the second parameter, and the function might do something different when it looks at the second parameter that was originally set to System.Type.Missing because it is now set to true as well.

To avoid this, you might think we would have to declare a missing1 through missing11 variable because of the possibility that Word might go and change one of the by ref parameters on you and thereby make it so you are no longer passing System.Type.Missing but something else such as true that might cause unintended side effects.

Fortunately, you do not have to do this when working with Office object models. Remember that the underlying Word Application object is an unmanaged object, and you are talking to it through COM interop. The COM interop layer realizes that you are passing a System.Type.Missing to an optional argument on a COM object. So interop obliges, and instead of passing a reference to your missing variable in some way, the interop layer passes a special COM value that indicates that the parameter was missing. Your missing variable that you passed by reference is safe because it was never really passed directly into Word. It is impossible for Word to mess with your variable, even though when you look at the syntax of the call it looks like it would be possible because it is passed by reference.

Therefore, the CheckSpelling code in Listing 1-17 is completely correct. Your missing variable is safe-it will not be changed on you by Word even though you pass it by reference. But remember, this is a special case that only applies when talking through COM interop to an unmanaged object model that has optional arguments. Do not let this special case make you sloppy when calling methods on objects outside the Office object model that require parameters to be passed by reference. When talking to non-Office object model methods, you have to be careful when passing parameters by reference because the managed method can change the variable you pass.

Events

You have now read about the use of properties and methods in some detail-these are both ways that your code controls the Office application. Events are the way the Office application talks back to your code and enables you to run additional code in response to some condition that occurred in the Office application.

In the Office object models, there are far fewer events than there are methods and properties-for example, there are 41 events in Word and 85 in Excel. Some of these events are duplicated on different objects. For example, when the user opens a Word document, both the Application object and the newly created Document object raise Open events. If you wanted to handle all Open events on all documents, you would handle the Open event on the Application object. If you had code associated with a particular document, you would handle the Open event on the corresponding Document object.

In most of the Office object models, events are raised by a handful of objects. The only objects that raise events in the Word object model are Application, Document, and OLEControl. The only objects that raise events in the Excel object model are Application, Workbook, Worksheet, Chart, OLEObject, and QueryTable. Outlook is a bit of an exception: About half of the objects in the Outlook object model raise events. However, most of these objects raise the same set of events, making the total number of unique events small in Outlook as well.

Table 1-4 shows all the events raised by Excel’s Application object. This table represents almost all the events raised by Excel because events prefaced by Sheet are duplicated on Excel’s Worksheet object, and events prefaced by Workbook are duplicated on Excel’s Workbook object. The only difference in these duplicated events is that the Application-level Sheet and Workbook events pass a parameter of type Sheet or Workbook to indicate which worksheet or workbook raised the event. Events raised by a Workbook object or Sheet object do not have to pass the Sheet or Workbook parameter because it is implicitly determined from which Workbook or Sheet object you are handling events for.

To handle the events raised by Office object models, you must first declare a handler method in your code that matches the signature expected by the event being raised. For example, the Open event on the Application object in Excel expects a handler method to match the signature of this delegate:

public delegate void AppEvents_WorkbookOpenEventHandler(Workbook wb);

To handle this event, you must declare a handler method that matches the expected signature. Note that we omit the delegate keyword shown in the signature above in our handler method because we are not defining a new delegate type, just implementing an existing one defined by the Office object model.

public void MyOpenHandler(Excel.Workbook wb)
{
  MessageBox.Show(wb.Name + " was opened. ");
}

Finally, you must connect your handler method , to the Excel Application object that raises this event. We create a new instance of the delegate object defined by the Excel object model called AppEvents_WorkbookOpenEventHandler. We pass to the constructor of this object our handler method. We then add this delegate object to the Excel Application WorkbookOpen event using the += operator.

app.WorkbookOpen +=
  new AppEvents_WorkbookOpenEventHandler(MyOpenHandler);

Although this seems complex, Visual Studio 2008 helps by auto-generating most of this line of code as well as the corresponding event handler automatically. If you were typing this line of code, after you type +=, Visual Studio displays a pop-up tooltip. If you press the Tab key twice, then Visual Studio generates the rest of the line of code and the handler method automatically.

Pascal Casing in Code Generated by Visual Studio

Visual Studio determines how to name the variables by reading information from the PIA definition. The definitions of the event parameters in the PIAs use Pascal casing, wherein the first letter of each word in a parameter is capitalized (CancelDefault, for example). Proper .NET style, however, says that parameters should be Camel cased (cancelDefault, for example). In this book, we’ve changed the casing from Pascal casing, which Visual Studio generates, to Camel casing to match .NET style guidelines.

Listing 1-19 shows a complete implementation of a handler method and event hookup in a simple class. The handler method is called MyOpenHandler and is a member method of the class SampleListener. This code assumes that a client creates an instance of this class, passing the Excel Application object to the constructor of the class. The ConnectEvents method connects the handler method MyOpenHandler to the Excel Application object’s WorkbookOpen event. The DisconnectEvents method removes the handler method MyOpenHandler from the Excel Application object’s WorkbookOpen event by using the -= operator on the delegate object. It might seem strange that we create a new instance of our delegate object when removing it, but this is the way C# supports removing delegates.

The result of this code is that any time a workbook is opened and ConnectEvents has been called, MyOpenHandler will handle the WorkbookOpen event raised by Excel’s Application object and it will display a message box with the name of the workbook that was opened. DisconnectEvents can be called to stop MyOpenHandler from handling the WorkbookOpen event raised by Excel’s Application object.

The “My Button Stopped Working” Issue

One issue commonly encountered when beginning to program against Office events in .NET is known as the “my button stopped working” issue. A developer will write some code to handle a Click event raised by a CommandBarButton in the Office toolbar object model. This code will sometimes work temporarily but then stop. The user will click the button, but the Click event appears to have stopped working.

The cause of this issue is connecting an event handler to an object whose lifetime does not match the desired lifetime of the event. This typically occurs when the object to which you are connecting an event handler goes out of scope or gets set to null so that it gets garbage collected. Listing 1-20 shows an example of code that makes this mistake. In this case, an event handler is connected to a newly created CommandBarButton called btn. However, btn is declared as a local variable, so as soon as the ConnectEvents function exits and garbage collection occurs, btn gets garbage collected and the event connected to btn is not called.

The complete explanation of this behavior has to do with btn being associated with something called a Runtime Callable Wrapper (RCW). Without going into too much depth, btn holds on to an RCW that is necessary for the event to propagate from the unmanaged Office COM object to the managed event handler. When btn goes out of scope and is garbage collected, the reference count on the RCW goes down and the RCW is disposed-thereby breaking the event connection.

CommandBar Versus Ribbon

CommandBar objects are the way developers would customize Office menus and toolbars up through the Office 2003 release. Office 2007 introduces a new UI model called the Office Fluent Ribbon that is used to add menus and buttons. This new UI model also has a new programming model that is discussed in detail in Chapter 17, “Working with the Ribbon in VSTO.” Although most of the time in your Office 2007 development you will want to use the Ribbon programming model, there are still times when you need to fall back to the CommandBar style object model-for example, for customizing context menus in Outlook or for customizing Outlook’s Explorer window. Older code that uses CommandBarButtons to create custom menus and toolbars continues to work, but any custom menus or toolbars are displayed in the Add-Ins tab of the Ribbon.

Listing 1-21 shows a second example of a failed event listener class that is attempting to connect to Outlook’s NewInspector event, which is raised by Outlook’s Inspectors object. This event is raised whenever an inspector window opens (a window where you are viewing or editing an Outlook item). This code will also fail to get any events. In this case, it is more subtle because the event handler is connected to the Inspectors object, which is temporarily created in the line of code that begins with app.Inspectors. Because the Inspectors object returned by app.Inspectors is not stored in a permanent variable, the temporarily created Inspectors object is garbage collected, and the event connected to it will never get called.

The fix for this issue is to declare a variable whose lifetime matches the lifetime of your event handler and set it to the Office object for which you are handling the event. Listing 1-22 shows a rewritten class that successfully listens to the CommandBarButton Click event. This class works because instead of using the method-scoped variable btn, it uses a class-scoped member variable called myBtn. This ensures that the event handler will be connected for the lifetime of the class when ConnectEvents is called.

Listing 1-23 shows a similar fix for our failed Outlook example. Here we declare a class-level variable called myInspectors that we assign to app.Inspectors. This ensures that our event handler will be connected for the lifetime of the class when ConnectEvents is called because the lifetime of myInspectors now matches the lifetime of the class.

When Method Names and Event Names Collide

In several cases in the Office object models, an object has an event and a method that have the same name. For example, Excel’s Workbook object has an Activate event and an Activate method. Outlook’s Inspector and Explorer objects have Close events and Close methods.

When using an Office object model object that has events such as Workbook, you are actually using an object that implements several interfaces. One of those interfaces has the definition of the Close method and a separate interface has the definition of the Close event. To handle an event for which a method name collides, you must cast your object to the interface that contains the event definitions. The interface that contains the event interfaces is named ObjectNameEvents_Event, where ObjectName is the name of the object, such as Workbook or Inspector.

Listing 1-24 casts the Workbook object myWorkbook to Excel.WorkbookEvents_Event when adding the event handler. By casting myWorkbook to the WorkbookEvents_Event interface, we disambiguate between the Close method (which is on the Workbook interface) and the Close event (which is on the WorkbookEvents_Event interface).

The Office Primary Interop Assemblies (PIAs)

Before learning any more about how to build Office solutions, you need to understand in more detail the managed assemblies that you use to talk to the Office object model in .NET. The managed assemblies used to talk to Office are called the Office primary interop assemblies (PIAs).

As mentioned previously, when you are talking to an Office object model in .NET, you talk to it through a .NET technology called COM interop. The Office object models are all written in unmanaged code (C and C++) that exposes COM interfaces. To talk to these COM interfaces from managed code (C# or Visual Basic), you talk via a wrapper that allows managed code to interoperate with the unmanaged COM interfaces of Office. This wrapper is a set of .NET classes compiled into an assembly called a PIA.

The word primary is used when describing these assemblies because they are the Office-approved wrappers for talking to the Office object models. This designation is needed because you could create your own wrapper for the Office COM object models by using a tool provided with .NET called TLBIMP. A wrapper you create on your own is called an interop assembly (IA) rather than a PIA. Even though you might be tempted to go play with TLBIMP and build your own interop assemblies, you should never use anything other than the Office-provided interop assemblies to do Office development. If every developer created his or her own sets of wrappers for Office development, then no Office solution could interoperate with anyone else’s solution; each interop wrapper class of, say, Worksheet created by each developer would be considered a distinct type. Even though the interop assembly I created has a Worksheet object and the interop assembly you created has a Worksheet object, I cannot pass you my Worksheet object and you cannot pass me your Worksheet object. We need to both be using the same interop assembly: the primary interop assembly.

A second reason to not build your own interop assemblies is that Office has made special fixes to the PIAs to make them work better when doing Office development. If you generate your own, then you are likely to run into issues that are fixed in the PIAs.

Installing the PIAs

The Office 2007 PIAs are available through the Office 2007 installer. The Office 2007 PIAs are also available as a Microsoft Windows Installer package that you can redistribute with your application. To install the Office 2007 PIAs through the Office 2007 Installer, when you do a setup, check the Choose advanced customization of applications check box in the first step of the Office 2007 Setup Wizard. Then in the tree control that appears in the next screen of the wizard, you will see a .NET Programmability Support node under each application for which PIAs are available, as shown in Figure 1-3. Click each of these .NET programmability support nodes and make sure that you set Run from my computer. Also, under the Office Tools node in the tree, you might want to turn on Microsoft Forms 2.0 .NET Programmability Support and Smart Tag .NET Programmability support. A second method to getting the Office 2007 PIAs is to do a complete install of Office 2007-all the .NET programmability support will be turned on for you automatically.

The Office PIAs get installed to the Global Assembly Cache (GAC). The GAC is usually in the Assembly subdirectory of the Windows directory.

A number of Office PIAs are available. Table 1-5 lists some of the most common ones. One PIA listed here of note is the Office.dll PIA, which is where common types that are shared between the Office applications such as CommandBar are found.

Referencing the PIAs

Adding a reference to a PIA is not necessary for most VSTO projects because the reference is automatically added for you. The console application examples in this book, such as the ones that automate Excel, can be typed into a Visual Studio console project and compiled, but you must first add a reference to the necessary PIA

. To add a reference, right-click the References folder under your project in the Visual Studio Solution Explorer, as shown in Figure 1-4. Choose Add Reference from the menu that pops up when you right-click the References folder.

Choose the COM tab of the Add Reference dialog box that appears, as shown in Figure 1-5. The COM references are listed by component name, which matches the description column in Table 1-5. So, to add a reference to the Excel PIA, you choose the Microsoft Excel 12.0 Object Library and click the OK button to add the Excel 2007 PIA reference to your project, as shown in Figure 1-5.

Note in Figure 1-5 that the Path column in the COM tab of the Add References dialog box displays the path to the COM library that the PIA wraps. For example, the Microsoft Excel 12.0 Object Library points to the location on your machine of the Excel.EXE executable. When you select these references and close the dialog box, you can examine the properties of the actual references that were added by expanding the References folder in the project, right-clicking the references that you added, and choosing Properties. You will see that Visual Studio figures out the PIA managed object in the GAC that corresponds to the COM object you selected. In this case, you will not get a reference to the Excel.EXE executable but instead to the Microsoft.Office.Interop.Excel.dll in the GAC.

Finally, note that even though you did not explicitly add a reference to the Microsoft Office 12.0 Object Library (office.dll), a reference is added for you. This is because the Excel 12.0 Object Library uses types from the Microsoft Office 12.0 Object Library. Visual Studio detects this and adds the required Office PIA to your project references automatically.

When you look at the PIA you have referenced in the object browser in Visual Studio, you might find yourself very confused. The object browser shows many helper objects that are created as part of the interop wrapper. For example, consider what .NET Interop does to the seemingly simple Excel Application object. It turns it into a multiple-headed (8 heads to be exact, 36 if you count each delegate individually) monster. All of the following are public types that you see in the browser related to the Excel Application object:

Interfaces

_Application

AppEvents

AppEvents_Event

Application

IAppEvents

Delegates

AppEvents_*EventHandler (29 of them)

Classes

AppEvents_SinkHelper (AppEvents)

ApplicationClass (_Application, Application, AppEvents_Event)

This pattern repeats for Chart, OLEObject, QueryTable, Worksheet, and Workbook.

Let’s try to untangle this mess by working our way backward from the original COM definition of the Excel Application object. The COM coclass for the Application object looks like this-it has two interfaces, a primary interface called _Application and an event interface called AppEvents. You can think of a coclass as something that defines the interfaces that a COM class implements.

coclass Application {
        [default] interface _Application;
        [default, source] dispinterface AppEvents;
    };

TLBIMP (which is used to process the COM type library for Excel and make the PIA) directly imports the _Application and AppEvents interfaces, so this explains where two of the eight types come from. But the AppEvents interface is not very useful-it seems like an artifact of the TLBIMP conversion in some ways. It has to be processed further to create another interface described later called AppEvents_Event to be of any use.

When TLBIMP processes the COM coclass, it creates a .NET class called ApplicationClass, which is named by taking the coclass name and appending Class. It also creates a .NET interface with the same name as the coclass called Application for our example. If you look at Application in the browser, it has no properties and methods of its own, but it derives from the other two interfaces associated with the coclass: _Application and AppEvents_Event.

We have not yet explained where the AppEvents_Event interface comes from. When TLBIMP processes the AppEvents event interface on the coclass, it creates several helper types. First it creates AppEvents_ Event, which looks like AppEvents but with events and delegate types replacing the methods in AppEvents. It also creates delegates named AppEvents_*EventHandler, where * is the method name for each method on the original AppEvents interface. Finally, it creates an AppEvents_ SinkHelper, which can be ignored.

That leaves only the IAppEvents interface unexplained. TLBIMP imports this interface directly because it is a public type in the Excel type library. You can ignore this also. This is effectively a duplicate of AppEvents, except AppEvents is declared as a dispinterface in the type library, and IAppEvents is declared as a dual interface type.

So which of these do you really use? Basically, you should only use in your code the Application interface (which derives from _Application and AppEvents_Events) and the delegates. You can usually pretend the others do not exist. The one exception to this rule is when a method and event name collide, as described earlier in this chapter. To disambiguate between a method and an event you must cast to the _Application interface when you want to call the method or the AppEvents_Event interface when you want to connect to the event. Table 1-6 presents a summary.

The Application interface that is created by TLBIMP for the coclass behaves in an interesting way. You can write code in C# that makes it look like you are creating an instance of the Application interface, which we all know is impossible:

Excel.Application myApp = new Excel.Application();

Really, this is syntactical sugar that is using the ApplicationClass behind the scenes (the Application interface is attributed to associate it with the ApplicationClass) to create an Excel Application object and return the appropriate interface.

Finally, we mentioned earlier that this pattern repeats for Chart, OLEObject, QueryTable, Worksheet, and Workbook. The mapping to Chart is straightforward-replace Application with Chart and AppEvents with ChartEvents and you’ll get the general idea. Worksheet is a bit different. Its coclass looks like this:

coclass Worksheet {
        [default] interface _Worksheet;
        [default, source] dispinterface DocEvents;
    };

So for Worksheet, replace Application with Worksheet, but replace AppEvents with DocEvents-yielding DocEvents_*EventHandler as the delegates for WorkSheet events.

QueryTable is even weirder. Its coclass looks like this:

coclass QueryTable {
        [default] dispinterface _QueryTable;
        [default, source] dispinterface RefreshEvents;
    };

So for QueryTable, replace Application with QueryTable and replace AppEvents with RefreshEvents-yielding

RefreshEvents_*EventHandler as the delegates for QueryTable events.

When you look at the Excel PIA in the object browser in Visual Studio, you might notice a slew of methods with the text Dummy in them. There’s even an interface called IDummy.

No, this is not Excel’s way of insulting your intelligence. Everything with Dummy in it is a test method that actually has a legitimate purpose and more descriptive names in Microsoft’s internal “debug” version of Excel. For example, Application.Dummy6 is called Application.DebugMemory in the debug version of Excel. Each method was renamed to Dummy in the retail version of Excel. All 508 of these Dummy methods actually do something in debug Excel, but in the retail version of Excel, they do nothing except raise an error when called.

Excel has marked these as “hidden,” but the C# object browser shows hidden methods by default. When you view the PIA in the C# object browser, you will see these Dummy methods. If you create a Visual Basic project, the Visual Basic object browser will hide methods and properties with this attribute.

Conclusion

This chapter introduced the Office object models and examined the basic structure followed by the Office object models. You learned how to work with objects, collections, and enumerations-the basic types found in any object model. You also learned how to use properties, methods, and events exposed by objects and collections in the Office object models.

This chapter introduced the Office primary interop assemblies that expose the Office object models to .NET code. You learned how to use and reference Office PIAs in a Visual Studio project. This chapter also described what you can ignore when viewing the PIA in the object browser.

The next chapter begins examining the basic patterns of development used in Office programming and provides examples of each.