CPAN

With bits and bobs, we’ve covered much of the core functionality of Perl, but perl also comes with dozens of useful modules in its standard distribution. CPAN expands this with an enormous wealth of other contributed modules that you can install. There are three ways of doing this:

The most direct method is to go directly to CPAN, download the module you want (as a zipped tarball), unzip it and untar it, change directory to the folder you’ve just unzipped, and build the module with dmake or whatever make program you have available. So, if we wanted to install the Parse::RecDescent module, you’d download the Parse-RecDescent-1.967009.tar.gz tarball from CPAN, and then:

gzip -d Parse-RecDescent-1.967009.tar.gz
tar -xf Parse-RecDescent-1.967009.tar
cd Parse-RecDescent-1.967009
perl Makefile.pl
dmake
dmake test
dmake install

However, this relies on a number of Unix utilities that you may not have if you’re on Windows (but can easily obtain), and will fail if the module has any dependencies.

The other two ways are much easier. The first is to use the CPAN.pm module, which comes with the core perl distribution. If you type:

perl -MCPAN -e shell

at a command prompt, it will open a CPAN shell that can install modules for you. Accept the default configuration options. To install a module, all you then need type into the CPAN shell is:

install Parse::RecDescent

and this will be done automatically.

The alternative for those using the ActiveState port of perl on Window is to use the Perl package manager, ppm. If you type:

ppm install Parse-RecDescent

at a command prompt, ppm will do it’s best to install the module for you. This technique relies on the fact that someone else has done the equivalent of the the Makefile.pl/dmake dance on Parse::RecDescent and uploaded the result to the ActiveState ppm repository as a file named Parse-RecDescent.ppd (note the hyphen replacing the double-colons). YMMV as the coverage of CPAN by the ActiveState repository is not 100%.

Some Perl modules contain extensions written in C (so called XS modules), usually to increase the speed with which they are able to run. When these modules are built with make, the C is compiled and linked, for which you need a C compiler. If you are on Windows and using the ActiveState distribution, you will find it useful to install the C-compiler MinGW so that you can compile such modules. This is a simply matter of typing:

ppm install MinGW

at a command prompt.

Some modules I could not do without…

Command line option processing

The Getopt::Long module allows you to easily modify the behaviour of a script according to the switches you pass it:

use Getopt::Long;
GetOptions (
    # takes a hash of ( switch => reference in which to put its value ) pairs
    'help' => \ my $help,
        # simple boolean switch, sets $help to 0 or 1 
        # depending on whether the --help switch was present
    'verbose!' => \ my $verbose,
        # the ! indicates that --verbose and --noverbose are both valid switches
    'source=s' => \ my $source,
        # =s indicates that the source option requires
        # a string argument, --source=c:/temp
    'number:i' => \ my $number,
        # :i indicates that the number option
        # takes an optional integer argument, --number=3
);
$number ||= 666;
    # $number  is set to 666 if it's not been set to this by the switches
    # note that this would overwrite a switch of -n 0
$number = 666 unless defined $number;
    # if you would like -n 0 to not be ignored!

Note the direct use of references to lexically scoped variables (\ my $var) to save you having to declare the switch-recording variables before calling GetOptions() . Note also the use of ||= and unless defined to default the values of unset switch variables.

You can then run your script with the switches:

script.pl --verbose
script.pl --noverbose
script.pl --verbose --source="C:/temp/in.txt" --number=9
script.pl --help

to enable variations on a theme. You can also get away with shortening these to unambiguous strings, leaving out the equals signs and just using a single dash:

script.pl --verbose --number=9 --source="C:/temp/in.txt"

script.pl -v -n 9 -s "C:/temp/in.txt"

perldoc Getopt::Long for details.

Interacting with databases

The DBI (database interface) module allows you to talk with a database server, like MySQL, Oracle, etc. To use it, you use DBI; then tell the DBI module which driver (DBD:: module) you want to use to talk to the database server, e.g. DBD::mysql for MySQL. Here’s a brief foray:

use DBI;
my $sql_query_string = 
    "SELECT Genus, Species FROM Plants WHERE Genus = 'Drosera'";
my $dsn = "DBI:mysql:database=plants;host=localhost;port=3306";
    # $dsn contains the configurations for MySQL,
    # such as the server and database to use
my $dbh = DBI->connect( $dsn, "Username", "Password1" )
    or die "Can't connect to server\n";
    # create a database handle and connect it to the database
my $sth = $dbh->prepare( $sql_query_string );
$sth->execute();
while ( my $row = $sth->fetchrow_hashref() ) {
    print "$row\n" ;
}
$dbh->disconnect();

Again, for more details, perldoc DBI.

Handling files portably

The File:: modules have various utilities for messing with files. Although Perl has the rename command, which can be used to both rename and move files about:

rename "C:/flunge.log", "D:/piffle.log";

It doesn’t have a native copy function (although it’s quite easy to roll your own badly):

local $/ = undef;          # set the input delimiter to nothing, so...
$whole_file = <$IN>;       # this slurps in the whole file from the IN filehandle
print $OUT $whole_file;    # print it to another filehandle OUT

File::Copy allows you to copy files about with ease and without worry:

use File::Copy;
copy( "D:/index.html", "F:/backup/index.html");

File::Find allows you to traverse a directory tree recursively, and do stuff to files in it:

use File::Find;
find( \&wanted, "D:/perl" );
    # find takes a list of arguments
    # the first is a reference to a subroutine to run each time a file is found
    # the rest are the directories to search, here just one item, D:/steve
sub wanted {
    # the name of the file found is put in $_
    # the current directory path and file is put in $File::Find::name
    if ( /\.(htm|html)$/ ) { print "Found an htmlfile $File::Find::name\n"; }
}

You’ll also want to look in the File and Cwd namespaces if you ever find yourself wanting to create a temporary file, or concatenate file and directory names in a platform-independent way, or parsing a filename into drive, path and file:

use 5.14.1;
use File::Spec;
use Cwd;

my $cwd = getcwd;
    # imported from Cwd
say File::Spec->catfile( $cwd, 'subdirectory', 'filename.log' );
    # catfile concatenates a list of directories and a filename with appropriate / or \\
    # catdir does similar but for a list of directories

my $absolute_path = File::Spec->rel2abs( '..\Python' );
say $absolute_path;

my ($drive, $directories, $file ) =
    File::Spec->splitpath( 'H:\\Perl\\bin\\h2xs.bat' );
say "# $_" for $drive, $directories, $file;

H:\Perl\subdirectory\filename.log
H:\Python
# H:
# \Perl\bin\
# h2xs.bat

Using internet protocols through Perl

The Net modules are Perl implementations of Internet protocols like FTP:

use Net::FTP;
my $ftp = Net::FTP->new( "ftp.myhost.com" );
    # connect to server, note this is an OO module
$ftp->login( "Bob", "Password1" );
$ftp->cwd( "/files" );
$ftp->get( "the_one_i_want.txt" );
$ftp->quit();

Similar implementations of all the other Internet protocols are available, perldoc Net::blah for each’s documentation.

Manipulating lists

List::Util is a good place to look to avoid wheel reimplementations:

use List::Util qw(first max min reduce shuffle sum);
my @list     = ( 1, 32, 8, 4, 16 );
my $max      = max @list;
my $min      = min @list;
my $sum      = sum @list;
my $first    = first { $_ > 10 } @list;
my @shuffled = shuffle @list;
my $product  = reduce { $a * $b } @list;
print <<"__REPORT__";
Max:      $max
Min:      $min
Sum:      $sum
First:    $first
Shuffled: @shuffled
Product:  $product
__REPORT__

Max:      32
Min:      1
Sum:      61
First:    32
Shuffled: 8 4 32 16 1
Product:  16384

reduce calls the block you pass it repeatedly (much like sort), so can be used to perform various map to scalar conversions, although the module already comes with five of the most useful, and List::MoreUtils has even more.

Graphical user interfaces

I wouldn’t necessarily recommend Tk these days (I’d probably suggest Wx, but have not actually used this), but sometimes you want something a little easier on the eye than black box with a prompt in it:

use Tk;
my $mw = new MainWindow; # Make a new window
$mw->title( "My first little GUI" );
my $button = $mw->Button(
    # Create a button, configure it with a (-key => value) hash
    -text    => "Hello world",
    -command => sub { exit(0) },
        # the -command key takes a coderef as its value, here to exit
);
$button->pack();
    # The button needs to be packed by the geometry manager into
    # the MainWindow to be visible
MainLoop();
    # Start the main event loop that handles the button clicks, etc.

Templating

HTML::Template is useful for creating HTML files from templates (*surprise*), but it also useful grounding for other, more complex templating engines. The module allows three main constructs in the HTML template: variables, loops and conditionals, which is about as complex as you can embed into HTML without severely entangling the design with the technology. Here is a simple template for a list of species in a genus of plants:

<html>
  <head>
    <title><TMPL_VAR NAME="GENUS"></title>
  </head>
  <body>
    <h1>Genus <TMPL_VAR NAME="GENUS"></h1>
    <p>Species:</p>
    <ul>
      <TMPL_LOOP NAME="SPECIES">
      <li><TMPL_VAR NAME="EPITHET"> <TMPL_VAR NAME="AUTHORITY">
      <TMPL_IF NAME="COMMON_NAME">
        [<TMPL_VAR NAME="COMMON_NAME">]
      </TMPL_IF>
      <TMPL_IF NAME="IUCN"> - IUCN status <TMPL_VAR NAME="IUCN">
        <TMPL_ELSE> - Conservation status unknown
      </TMPL_IF>
      </li>
    </TMPL_LOOP>
    </ul>
  </body>
</html>

Filling in the template is a straightforward matter:

use strict;
use HTML::Template;

my $template = HTML::Template->new( filename => "monograph.html" );
$template->param( GENUS => 'Sarracenia' );
my @species;
while ( <DATA> ) {
    chomp;
    next if /^\s*$/;
    my ( $epithet, $authority, $common_name, $iucn ) = split /\s*:\s*/;
    push @species, {
        EPITHET     => $epithet,
        AUTHORITY   => $authority,
        COMMON_NAME => $common_name,
        IUCN        => $iucn,
    };
}
@species = sort { $a->{'epithet'} cmp $b->{'epithet'} } @species;
$template->param( SPECIES => \@species );
print $template->output;

__DATA__
alata : Alph.Wood : Pale pitcher plant : NT
flava : L. : Yellow pitcher plant : LC
leucophylla : Raf. : White pitcher plant : VU
minor Walt. : Hooded pitcher plant : LC
oreophila : (Kearney) Wherry : Green pitcher plant : CR
psittacina : Michx. : Parrot pitcher plant : LC
purpurea : L. : Purple pitcher plant :
rubra : Walt. : Sweet pitcher plant :

HTML::Template has three important methods. The first is new():

my $template = HTML::Template->new( filename => "monograph.html" );

This creates a templating object which will fill in the gaps in a file called monograph.html, which is the HTML-ish file shown above. The second important method is param(), which takes a hash of name => value pairs:

$template->param( TMPL_VARIABLE_NAME => "value to substitute in" );
$template->param( GENUS => 'Sarracenia' );

Any occurrence of the tag:

<TMPL_VAR NAME="GENUS">

in the template will be replaced with the value Sarracenia when you come to use output,

print $template->output;

the template object will duly fill in the gap:

<h1>Genus Sarracenia</h1>

The module also allows for conditionals and loops. To create loops, rather than using a simple hash, you use a reference to an array of hashrefs instead:

my @species;
while ( <DATA> ) {
    my ( $epithet, $authority, $common_name, $iucn ) = split /\s*:\s*/;
    push @species, {
        EPITHET     => $epithet,
        AUTHORITY   => $authority,
        COMMON_NAME => $common_name,
        IUCN        => $iucn,
    };
}
$template->param( SPECIES => \@species );

which generates something like this in the output::

<li>alata Alph.Wood [Pale pitcher plant] - IUCN status NT</li>
<li>flava L. [Yellow pitcher plant] - IUCN status LC</li>
<li>...

If you pass the param() method a ( SPECIES =>\@array_of_hashrefs ) pair, the module will look for a corresponding <TMPL_LOOP NAME="SPECIES"></TMPL_LOOP> pair in the template. So in this case, we define an arrayref called SPECIES, which contains a number of { EPITHET => "flava", AUTHORITY => "L.", etc } hashrefs in the script. When we send this data to the template, it sets <TMPL_VAR NAME="EPITHET"> and <TMPL_VAR NAME="AUTHORITY"> to each of the corresponding values from the loop variable.

You’ll also notice that testing for conditionals is just as easy:

<TMPL_IF NAME="IUCN">
    IUCN status <TMPL_VAR NAME="IUCN">
<TMPL_ELSE>
    Conservation status unknown
</TMPL_IF>

We set a parameter in the template object called IUCN in the script. In the template, if this is TRUE, then the HTML between the <TMPL_IF NAME="IUCN"></TMPL_IF> will be filled in appropriately and outputted. You can also (as we have done here), specify a <TMPL_ELSE> within this structure to be filled in and outputted if IUCN is FALSE.

I also use Win32::OLE and Parse::RecDescent a huge amount, but they will be posts all of their very own.

Next up…packages and writing modules.

Installing modules

CPAN

Command line option processing

Interacting with databases

Handling files portably

Using internet protocols through Perl

Manipulating lists

Graphical user interfaces

Templating

polypompholyx

Leave a Reply Cancel reply

Recent Posts

Categories

Blogroll

Archives