Perl is a language that has been around for a while and is one of the most popular open source languages among system administrators, Web developers and the research community.

Meanwhile, Microsoft's .NET technology, which is comprised of a framework and set of tools, was recently released for creating sophisticated applications. Is it possible to have any connection between these two different worlds? Yes it is! Perl is now a .NET language. This is the first of a two-part series written to introduce and explore the tools and technologies that are giving Perl and .NET a new dimension.

Every once in a while a language comes along that gains the hype of the programming literati. In recent years, Java and now C# has stolen much of the headlines. However, for the past twenty years, there has been a programmer's diamond in the rough: Perl. When Larry Wall created this language in the early 1980's, it quickly became the language of choice for system administrators. With the growth of the Internet and, in particular, dynamic content, it has been the language of choice for Common Gateway Interface (CGI) applications. Now, with the introduction of .NET from Microsoft and PerlNET from ActiveState, Perl has become one of the standard .NET languages. Perl is an Open Source Language; however, PerlNET is also a commercially licensed technology. In this and the second article of this series, I'll explore this interesting transmission of Perl into a .NET language.

Let's begin by first demonstrating the features of Perl that make it such a gem. Two of these features are: regular expressions and associative arrays (hashes). Through a series of examples, I'll illustrate these and other features (you can download the source code for this article from: http://www.code-magazine.com/downloads/sepoct2002perl.zip). A hash is a data structure that allows a string to be used as an index. This data structure enables the programmer to avoid the cost of a table lookup. A regular expression allows a programmer to perform pattern matching. This technique is the cornerstone of data validation and other user interface queries.

This Camel Gets You a Long Way!

Perl is a weakly typed language although there is support for three different variable types: scalars, arrays and hashes. A scalar variable stores any single value whether it is a number, a string, a paragraph or an entire file. A scalar variable begins with a $ symbol. An array stores a collection of scalars. An array variable begins with a @ symbol. To select an array element, you use the [ ] notation. A hash stores a collection of key-value pairs and begins with a % symbol. A value is selected from a hash by using the { } subscript notation. Listing 1 summarizes these simple ideas. Anything from the # to the end of a line is a comment. On a lighter note, what would you call a variable that is declared as %brown?

Perl comes with a plethora of built in functionality. In Listing 1, I used the print function and the sort function. The \n sequence represents the new line character and must be double quoted.

As you learn Perl, you will become increasingly amazed at the economy of expression in the language. For example, you can print lines typed at the keyboard with:

while($line = <STDIN>) #$line gets input
{
   print $line; #print it
}

The phrase <STDIN> allows Perl to read a line at a time from the keyboard. Since this is a loop, each line is read and tucked away in the scalar $line. If you just say:

while(<STDIN>) # $_ gets input
{
  print; # print $_
}

Then by default, the special Perl variable $_ holds the line that was just read. Also, $_ is printed by default. Even though the {} notation is necessary above, the following form of the while allows you to eliminate them:

print while(<STDIN>);

You can just as easily read from files, but first you have to create your own file handle:

open(MYFILE, "< text"); # open for input
print while(<MYFILE>);

You can also create your own output file:

open(MYFILE, "< text"); # open for input
open("OUTPUT", " > out"); # open for output
print OUTPUT while(<MYFILE>);

OK. That's enough file manipulation. Now let's see what's so special about hashes. Hashes have utility in a surprisingly large number of applications. For example, let's try to count the occurrences of each word in a document. For simplicity, consider a word to be anything surrounded by white space. See Listing 2 for the solution to this problem.

A little explanation is necessary here. There is a Perl subroutine on lines 1 through 4 that is called on line 15. The while loop reads lines from the standard input and converts them to lower case. The split function splits $_ on white space and returns an array consisting of the words on this line. The foreach on line 9 loops over each word on the line and executes the following statement for each word:

$counts{$word}++;

This statement simply adds one to the value for each word encountered in the input. The loop on line 15 is a bit tricky.

sort bysize keys(%counts)

This expression sorts the keys using a special sort subroutine shown on lines 1 through 4. This subroutine compares two keys and swaps them based on their values. If you run that Perl code on all the text in this article through this paragraph you would find the last few lines of output to be.

of 24
a 41
the 49

Let's leave hashes and take a look at Perl's regular expression capability. Suppose you ask the user of your program to enter lines consisting of a name and a number and nothing else. Let's further suppose that there can be as many spaces or tabs between the name and the number. This is a perfect example of where a regular expression can be used, as shown in Listing 3.

In Perl, regular expressions are enclosed within a pair of slashes. In Listing 3, the regular expression is compared to $_. In order to understand the regular expression, let's separate it into its component parts:

^ begins with
[A-Z] upper case character
[a-z]+ one or more lower case characters
[ \t]+ one or more blanks and tabs
\d+ one or more digits
$ ends with

Taken all together, this pattern matches a string if the string begins with an upper case character followed by one or more lower case characters followed by one or more spaces and/or tabs and ends with one or more digits.

The parenthesis do not play any role in the match but, if there is a match, the portion of the match enclosed within each set of parentheses is remembered and stored in the special variables $1 and $2, etc.

Lots of system administrators love Perl because it has all the tools necessary to produce reports about resource use, such as users and files. The next example demonstrates a program that receives an integer representing a certain number of days. The program produces a listing of all the files that have been modified within that many days, as shown in Listing 4.

The first two lines are for error checking. The special array @ARGV contains the list of arguments from the command line. Each array has a variable $#arrayname associated with it that gives the subscript of the last element. In this case, if one argument is provided, then the highest subscript will be zero. Thus if $#ARGV is not zero, the program terminates. The program also terminates if the opendir call fails. Otherwise, line 3 uses the readdir function to read all the files in the current directory. The loop on line 4 loops through all these files using the ?f file inquiry operator to eliminate directory files. The ?M operator determines the date in which this file was last modified. Finally, if this time is less than what was supplied on the command line, print the name of this file.

As you have seen, Perl uses a lot of built-in functions. It's also very simple to write your own function. I actually used one earlier. Perl functions have an elegant behavior when it comes to argument passing. You can pass a variable number of arguments to any programmer written function. All of the arguments are collected into the special array @_. Listing 5 demonstrates this feature. It's also worth mentioning that all Perl functions return a value, the last expression evaluated in the function. If you don't want to use the returned value, you can just ignore it.

On line 3, the my function localizes the variable $total to make sure it does not collide with the same named variable elsewhere in the program. The function compute_mean collects all of the parameters in the special array @_. The foreach on lines 4-7 sums them together. Line 8 uses the array @_ in a scalar context, which forces Perl to treat @_ as a number, that is, the size of the array. Note that the keyword return on line 8 is not necessary.

The variable nature of Perl functions is not always a blessing. For example, if you wanted to send several arrays to a function and return an array containing the sum of the elements in each array, there would not appear to be a way to determine where each array began and where each array ended. To solve problems like this, you need to know about references.

A reference is a scalar whose contents are the address of another variable. So a reference is a pointer. Use the \ operator to take the address of a variable and either the $, @ or % operator to de-reference it, depending on what it is referring to. Consider the following code snippet:

@data = (80, 50); # create an array
$ref = \@data; # take its address
print "@data\n"; # print array directly
print "@$ref\n"; # ...indirectly through $ref
print "$data[0]\n"; # print 0th element directly
print "$$ref\n"; # print 0th element indirectly

Using the simple concepts above, you can now write a Perl subroutine that returns the sums of individual arrays sent to it, as shown in Listing 6.

On line 18, three references are sent to the sums function. Each time the loop on line 4 is executed, $value will hold one of these array references. On line 7, this value must be de referenced in order to get at the actual values in the array being referenced. On line 11, the push function is used to push the sum (for the array being processed) to the @answers array. This underscores the fact that all Perl arrays are dynamic and can grow or shrink to meet programming demands. Finally, line 13 forces the evaluation of the @answers array so it can be returned. In Perl, the result of the last evaluated expression becomes the return value of the subroutine.

So far I have shown some of Perl's power, its economy of expression, hashes, regular expressions, file inquiry operators and built in subroutines. Now I'll move on to explore Perl as a .NET language.

PerlNET

PerlNET is the new product released by Activestate as part of their Perl Development Kit (PDK Version 4.1.1). PerlNET enables you to create and use .NET assemblies in Perl. The architecture of PerlNET is elegant, in that it makes it easy to wrap the existing Perl Modules and serve them to other .NET programs. PerlNET actually executes the Perl code using the standard Perl interpreter as "unmanaged code." First, I'll explore PerlNET with a simple console example, then move on to writing and using .NET components and how to wrap existing Perl Modules with PerlNET. Lastly, I'll conclude this article with discussions on programming Windows Forms in PerlNET.

Say Hello to PerlNET

Let us begin with a most familiar programming theme, the "Hello World!" example. Listing 7 shows the necessary code to say "Hello World!" using PerlNET.

The Perl reserved word use in line 1 and 2 tells the Perl Interpreter/PerlNET Compiler to include .NET's System namespace. The qw (AUTOCALL) instructs the Perl Compiler to retry all the calls as .NET calls, for which no Perl method was found. Thus, if I remove line 2 in Listing 7, I have to rewrite line 4 by using PerlNET::Call, which is the generic way of calling any .NET Static methods, otherwise called as Class methods.

PerlNET::Call("Console.WriteLine", "Long Hi!")

The Perl Development Kit provides plc.exe, a Perl compiler that compiles PerlNET code to .NET assemblies. The code in Listing 7 is compiled as follows:

plc ?target=exe HelloWorld.pl

One pitfall with the PerlNET compiler is that it provides syntax checks for .NET types and methods only at runtime instead of compilation time! Therefore, if you had typed WriteLine as writeline, you would know it only while running the program.

A Different Type of Animal

One of the proven ways to implement complex applications is to break the monolithic applications into reusable, integral software programs as components. Getting data across the components with correct data types is the crucial challenge in any component-based programming. The task of solving this is called Type Marshalling. In cases like PerlNET, this gets more challenging as it bridges between two different technologies: a loosely typed language and a strongly typed framework! And PerlNET could not have made it any easier. The following code snippet shows one of the ways to achieve type marshalling, which is by using a static method of Convert class:

$x = "50";
print Math->Cos(Convert->ToDouble($x));

PerlNET also provides type conversion Perl functions to be used while passing parameters or returning values to .NET. The following table lists those functions.

This works fine for converting method parameters, which is needed when writing simple, pure Perl programs that interact with .NET assemblies. But to write our own PerlNET components or extend .NET components, there is a need to specify the types at the definition level. While interfacing with .NET, PerlNET uses special comment blocks called POD (plain old documents) to define types.

On the interface side, the important ones are: Methods, which perform actions; Properties, which are accessed via getter and setter methods; Fields, which are accessed directly without any methods; and Constructors, which create the objects. In terms of types, PerlNET supports three kinds of types: pure Perl, .NET and mixed. Let's see some examples of Pure Typed programs.

Pure Types

Listing 8 provides an example of building a pure typed, .NET component using PerlNET, and Listing 8 and Listing 9 are compiled respectively as:

plc ?target=library City.pm
csc /reference:City.dll CityClient.cs

In reference to Listing 8, the easy way to design components in PerlNET is to design them as Perl Modules. Perl Modules have an extension of ".pm". A Perl Module has some unique features. You use the package statement to define the namespace. A Perl Module is a reusable package that is defined in a file whose name is the same as the Package with a ".pm" appended. Perl implicitly passes the name of the package as a parameter to every subroutine defined within that package's namespace. Often this is gathered into a variable named as $self. You can see that in lines 35, 46 and 56 of the code.

In Lines 5 through 22 you see multiple =for interface statements. These are the POD statements through which PerlNET defines types of all the methods, constructors and attributes, in general all the interface elements. In our example, the constructor is defined in line 9. But I don't have a subroutine by that name. Why is that? This is because what is in line 9 is there for the interfacing purposes and Perl constructors are named as new and not in the name of the class. The new subroutine is defined in line 24. This subroutine creates the class with the help of the bless built-in, which attaches a given hash (in our case it is the $city) to the current class. This enables subsequent accesses to the member variables in terms of this blessed hash. Line 8 specifies to PerlNET that this is a pure Perl component.

Pure typed PerlNET components cannot implement fields or virtual methods. Also, they cannot inherit from a .NET type. For these reasons, I could create .NET type or mixed type components. An advantage of pure Perl components is that they can be used even with normal Perl programs, making it easy to wrap existing Perl Modules and Classes and create .NET component assemblies.

PerlNET implements the public, private and protected access modifiers also in terms of =for interface blocks. The following code snippet declares a property name as a private method:

=for interface
  private int name;
=cut

Next, let's see how to use a .NET component in Perl. Listing 9 shows an example Perl program that consumes the component created by Listing 8.

Note the usage of PerlNET::true in Listing 9 could be cut short by including true as:

use PerlNET qw (AUTOCALL true);

The code in Listing 8 should be compiled by referencing the City.dll that was created before:

plc -reference=City.dll CityPerlClient.pl

Wrapping Existing Modules

In the following example, I wrap an existing Perl Module with PerlNET and use it from a C# program. With the wealth of available modules, this is an interesting application of PerlNET. Listing 10 shows the wrapper for a Perl Module available in CPAN.ORG, called Spell.pm. This module provides a method that returns a string that spells out the given integer. Listing 11 shows a simple C# program that utilizes this Perl Module, via the wrapper. You can easily start to appreciate how elegantly PerlNET achieves these integrations by noting how easy it is to wrap existing Perl Modules with PerlNET. You can download the Spell module (not the wrapper) from CPAN.ORG at: http://search.cpan.org/search?dist=WWW-SMS. If you have installed Perl, you can install this module without downloading by using the Perl Package Manger (ppm.bat). Refer to the Perl Package Manager's help information for further instructions on how to install Perl Modules.

There are certain key things to watch when wrapping existing Perl Modules with PerlNET. You have to name your PerlNET module with the same name as the Perl Module that you are wrapping. Also there should be no methods with the same name as the module name. If there are, they will be considered as the constructors. And, finally, hashes are not allowed in .NET, so any hashes returned by subroutines should be converted to arrays and passed on to .NET. PerlNET provides a handy method modifier, wantarray!, that automatically converts a return list into a .NET compatible array. As an example, the following definition:

wantarray! Str() names();

It becomes easy to return a list from the name method as:

sub names {
  return qw(apples oranges bananas );
}

.NET Types

The key difference between Pure Types and .NET Types in PerlNET is that, in .NET Typed programs, PerlNET passes references to objects instead of hashes.

This includes even the constructor. As the constructor gets a reference to an already constructed object, it becomes an initializer rather than constructor. .NET Typed assemblies can also have Fields. The following code snippets show the key areas where this differs from Pure Typed assemblies:

Sub MyClass ## Constructor
{
  #Get the reference to this object
  #Note:We get a reference though constructors
  #are static mehtods.
  my ($this, @args) = @_;
}

#Static method
sub StaticMethod
{
  #There is no reference to this object
  my (@args) = @_;
}

#Non Static method
sub NonStaticMethod
{
  #There is a reference to this object
  my ($this, @args) = @_;
}

Mixed Types

Mixed Types differ from .NET Types only in the way that Mixed Types can store references inside a blessed hash. In all other aspects, Mixed Types behave the same way as the .NET Typed assemblies. This means that, in all the subroutines in the component, I get both a reference to the blessed hash and a reference to the object. Therefore, it is common to see the first line of such a method as:

my ($this, $self, @args) = @_;

In building components to inherit from, or be inherited by, other .NET components, you should use Mixed and .NET Types.

In summary, PerlNET handles type marshaling through conversion functions and =for interface POD comments. PerlNET components can interface with .NET in three different ways: as a Pure Perl component, as a .NET component and as a mixed component. In addition, you can wrap existing Perl Modules to produce a .NET assembly. I briefly explained when to use which type of interface. In the next section, I'll show how to develop Windows GUI programs using PerlNET.

Building Windows GUI Using PerlNET

Windows GUI programming is one of the key benefits of PerlNET. Though there are other technologies that exist, like TCL/TK, to produce cross-platform GUIs using Perl, getting access to the .NET Framework's GUI classes is like opening a treasure vault! ActiveState provides a product that closely integrates with Visual Studio .NET called VisualPerl. However, drag-and-drop development is still yet to be realized. The code in Listing 12 shows a simple "Hello World" application.

This code should be compiled as:

plc ?target=winexe HelloWindow.pl ?reference=System.Windows.Forms.dll

Figure 1 shows the Windows form that displays when running HelloWindow.exe.

Note that the program is compiled with winexe as the target. This enables the compiled executable to run in its own thread. If I was to omit this option, compile and run the resulting executable from a DOS window, the DOS window would be locked until the HelloWindow program is closed. The necessary components (System.Windows.Forms.dll in this case) are attached to the HelloWindow.exe by the reference compiler option. The attribute [STAThread] used in line 10 ensures the compiled assembly runs in a Single Apartment Thread. Lines 3 through 5 provide the required namespaces to use the .NET Framework's Windows Form's classes. Line 9 indicates to the compiler that the HelloWindow package is inheriting from the Windows Forms class.

Line 16 instantiates the HelloWindow while the next line runs it using the static method Run of the Application class. Lines 23 and 24 set the text and size attributes of the form. Note that the attributes are not referred through the blessed hash but through a reference to the object itself.

Delegation, a key in the event-oriented GUI paradigm, is handled in PerlNET by add_ methods. These methods enable you to add your subroutines as event handlers. For example, the code for a MouseDown event would be:

=for interface
. . .
private void Form_MouseDown(any sender,
                     MouseEventArgs evArgs);
. . .
=cut
. . .
sub _init
{
  my ($this) = shift;
  $evtHandler = MouseEventHandler->new($this,
                  "Forms_MouseDown");
  $this->add_MouseDown($evtHandler);
}

PerlNET provides an easy way to access all the properties and methods of an object without having to type the object name all the time. This is done by the with method. Using with, lines 23 and 24 could be rewritten as:

with ($this, Text=> "Simple Windows Forms",
          Size=> Size->new(int(300), int(200));

In this first article of a two-part series, I have demonstrated some of the powers of the Perl language. I highlighted features like economy of expressions, regular expressions, hashes, file inquiry operators and subroutines. I then explored the main agenda, which is Perl in .NET by providing a brief introduction to PerlNET and detailing how to create and consume .NET components using Perl. I then explored how to wrap existing Perl modules with a .NET assembly. Finally, I showed how to create a simple Windows program using Perl.

In the next article, I'll explore how to access databases using ADO.NET and Perl. In addition, I'll discuss how to create Web pages and Web services using Perl. Until then, happy programming!

References

www.perl.com

A great starting point for any Perl-related search.

www.cpan.org

Comprehensive Perl Archive Network (CPAN), which hosts the collection of Perl modules and other information contributed by the Perl open source community.

An easy way to design components in PerlNET is to design them as Perl modules.
PerlNET programs can have pure Perl types, .NET types or a mixture of both.
Pure typed PerlNET components cannot implement fields or virtual fields, and cannot inherit from a .NET type.
In .NET typed programs to all subroutines that interface with .NET, PerlNET passes a reference to objects instead of hashes.
While interfacing with .NET, PerlNET uses special comment blocks called POD (Plain Old Documents) to define types.
Perl is an Open Source language; PerlNET is a commercially licensed technology.
Perl has three different variable types: scalars, arrays and hashes.