This is a modified chapter of Mastering Perl by brian d foy and published by O'Reilly Media. It may differ significantly from that in the book due to corrections, additions of new material, or removal of obsolete material. This material is online for community review in preparation for a second edition.
This work is copyrighted under a contract between O'Reilly Media and brian d foy, and you cannot repost it or distribute it without permission.
Perl has excellent tools for creating, testing, and distributing modules. On the other hand, Perl's good for writing stand-alone programs that don't need anything else to be useful. I want my programs to be able to use the module development tools and be testable in the same way as modules. To do this, I restructure my programs to turn them into modulinos.
main ThingOther languages aren't as DWIM as Perl, and they make us create a top level subroutine that serves as the starting point for the
application. In C or Java, I have to name this subroutine main:
/* hello_world.c */
#include <stdio.h>
int main ( void ) {
printf( "Hello C World!\n" );
return 0;
}
Perl, in its desire to be helpful, already knows this and does it for me. My entire program is the main routine, which is how Perl ends up with the default package main.
When I run my Perl program, Perl starts to execute the code it contains as if I had wrapped my main subroutine around the entire file.
In a module most of the code is in methods or subroutines, so most of it doesn't immediately execute. I have to call a
subroutine to make something happen. Try that with your favorite module; run it from the command line. In most cases, you won't see
anything happen. I can use perldoc's -l switch to locate the
actual module file so I can run it to see nothing happen:
$ perldoc -l Astro::MoonPhase
/usr/local/lib/perl5/site_perl/5.8.7/Astro/MoonPhase.pm
$ perl /usr/local/lib/perl5/site_perl/5.8.7/Astro/MoonPhase.pm
I can write my program as a module, then decide at run time how to treat the code. If I run my file as a program, it will act just like a program, but if I include it as a module, perhaps in a test suite, then it won't run the code and it will wait for me to do something. This way I get the benefit of a stand-alone program while using the development tools for modules.
My first step takes me backwards in Perl evolution. I need to get that main routine back and
then run it only when I decide I want to run it. For simplicity, I'll do this with a "Just another Perl hacker" (JAPH) program, but
develop something more complex later.
Normally, Perl's version of "Hello World" is simple, but I've thrown in package main just for
fun and use the string "Just another Perl hacker," instead. I don't need that for anything other than reminding the next maintainer
what the default package is. I'll use this idea later:
#!/usr/bin/perl
package main;
print "Just another Perl hacker, \n";
Obviously, when I run this program, I get the string as output. I don't want that in this case though. I want it to behave more like a module so when I run the file, nothing appears to happen. Perl compiles the code, but doesn't have anything to execute. I wrap the entire program in its own subroutine:
#!/usr/bin/perl
package main;
sub run {
print "Just another Perl hacker, \n";
}
The print statement won't run until I execute the subroutine, and now I have to figure out when
to do that. I have to know how to tell the difference between a program and a module.
The caller built-in tells me about the call stack, which lets me know where I am in Perl's
descent into my program. Programs and modules can use caller too; I don't have to use it in a
subroutine. If I use caller in the top level of a file I run as a program, it returns nothing
because I'm already at the top level. That's the root of the entire program. Since I know that for a file I use as a module caller returns something, and that when I call the same file as a program caller returns nothing, I have what I need to decide how to act depending on how I'm called:
#!/usr/bin/perl
package main;
run() unless caller();
sub run {
print "Just another Perl hacker, \n";
}
I'm going to save this program in a file, but now I have to decide how to name it. Its schizophrenic nature doesn't suggest a
file extension, but I want to use this file as a module later, so I could go along with the module file naming convention, which
adds a .pm to the name. That way, I can use it and Perl can find it just
as it finds other modules. Still, the terms program and module get in the way because it's really both. It's not a
module in the usual sense, though, and I think of it as a tiny module, so I call it a modulino.
Now that I have my terms straight, I save my modulino as Japh.pm. It's in my current directory, so I
also want to ensure that Perl will look for modules there (i.e. it has "." in the search path). I check the behavior of my
modulino. First, I use it as a module. From the command line, I can load a module with the -M
switch. I use a "null program", which I specify with the -e switch. When I load it as a module
nothing appears to happen:
$ perl -MJaph -e 0
$
Perl compiles the module, then goes through the statements it can execute immediately. It executes caller, which returns a list of the elements of the program that loaded my modulino. Since this is true,
the unless catches it and doesn't call run(). I'll do more with
this in a moment.
Now I want to run Japh.pm as a program. This time, caller returns
nothing because it is at the top level. This fails the unless check, and so Perl invokes the run() and I see the output. The only difference is how I called the file. As a module it does module
things and as a program it does program things. Here I run it as a script and get output:
$ perl Japh.pm
Just another Perl hacker,
$
Now that I have the basic framework of a modulino, I can take advantage of its benefits. Since my program doesn't execute if I include it as a module, I can load it into a test program without it doing anything. I can use all of the Perl testing framework to test programs too.
If I write my code well, separating things into small subroutines that only do one thing, I can test each subroutine on its own.
Since the run subroutine does its work by printing, I use Test::Output to capture standard output and compare the result:
use Test::More tests => 2;
use Test::Output;
use_ok( 'Japh' );
stdout_is( sub{ main::run() }, "Just another Perl hacker, \n" );
This way, I can test each part of my program until I finally put everything together in my run() subroutine, which now looks more like what I would expect from a program in C, where the main loop calls everything in the right order.
There are a variety of ways to make a Perl distribution, and we covered these in Chapter 15 of Intermediate Perl. If I start
with a program that I already have, I like to use my scriptdist program, which is available on
CPAN (and beware, because everyone seems to write this program for themselves at some point). It builds a distribution around the
program based on templates I created in ~/.scriptdist, so I can make the distro any way that I like, which
also means that you can make it any way that you like, not just my way. At this point, I need the basic tests and a Makefile.PL to control the whole thing, just as I do with normal modules. Everything ends up in a directory
named after the program but with .d appended to it. I typically don't use that directory name for
anything other than a temporary placeholder since I immediately import everything into source control. Notice I leave myself a
reminder that I have to change into the directory before I do the import. It only took me a fifty or sixty times to figure that
out:
$ scriptdist Japh.pm
Home directory is /Users/brian
RC directory is /Users/brian/.scriptdist
Processing Japh.pm...
Making directory Japh.pm.d...
Making directory Japh.pm.d/t...
RC directory is /Users/brian/.scriptdist
cwd is /Users/brian/Dev/mastering_perl/trunk/Scripts/Modulinos
Checking for file [.cvsignore]... Adding file [.cvsignore]...
Checking for file [.releaserc]... Adding file [.releaserc]...
Checking for file [Changes]... Adding file [Changes]...
Checking for file [MANIFEST.SKIP]... Adding file [MANIFEST.SKIP]...
Checking for file [Makefile.PL]... Adding file [Makefile.PL]...
Checking for file [t/compile.t]... Adding file [t/compile.t]...
Checking for file [t/pod.t]... Adding file [t/pod.t]...
Checking for file [t/prereq.t]... Adding file [t/prereq.t]...
Checking for file [t/test_manifest]... Adding file [t/test_manifest]...
Adding [Japh.pm]...
Copying script...
Opening input [Japh.pm] for output [Japh.pm.d/Japh.pm]
Copied [Japh.pm] with 0 replacements
Creating MANIFEST...
------------------------------------------------------------------
Remember to commit this directory to your source control system.
In fact, why not do that right now? Remember, `cvs import` works
from within a directory, not above it.
------------------------------------------------------------------
Inside the Makefile.PL I only have to make a few minor adjustments to the usual module setup so it
handles things as a program. I put the name of the program in the anonymous array for EXE_FILES
and ExtUtils::MakeMaker will do the rest.
When I run make install, the program ends up in the right place (also based on the PREFIX setting). If I want to install a manpage, instead of using MAN3PODS, which is for programming support documentation, I use MAN1PODS, which is for application documentation:
WriteMakefile(
'NAME' => $script_name,
'VERSION' => '0.10',
'EXE_FILES' => [ $script_name ],
'PREREQ_PM' => {},
'MAN1PODS' => {
$script_name => "\$(INST_MAN1DIR)/$script_name.1",
},
clean => { FILES => "*.bak $script_name-*" },
);
An advantage of EXE_FILES is that ExtUtils::MakeMaker modifies the shebang line to point to the path of the
perl binary that I used to run Makefile.PL. I don't have to worry about
the location of perl.
Once I have the basic distribution set up, I start off with some basic tests. I'll spare you the details since you can look in
scriptdist to see what it creates. The compile.t test simply
ensures that everything at least compiles. If the program doesn't compile, there's no sense going on. The pod.t file checks the program documentation for Pod errors (see Chapter 15 for more details on Pod), and
the prereq.t test ensures that I've declared all of my pre-requisites with Perl. These are the
tests that clear up my most common mistakes (or, at least the most common ones before I started using these test files with all of
my distributions).
Before I get started, I'll check to ensure everything works correctly. Now that I'm treating my program as a module, I'll test it every step of the way. The program won't actually do anything until I run it as a program, though:
$ cd Japh.pm.d
$ perl Makefile.PL; make test
Checking if your kit is complete...
Looks good
Writing Makefile for Japh.pm
cp Japh.pm blib/lib/Japh.pm
cp Japh.pm blib/script/Japh.pm
/usr/local/bin/perl "-MExtUtils::MY" -e "MY->fixin(shift)" blib/script/Japh.pm
/usr/local/bin/perl "-MTest::Manifest" "-e" "run_t_manifest(0, 'blib/lib', 'blib/arch', )"
Level is
Test::Manifest::test_harness found [t/compile.t t/pod.t t/prereq.t]
t/compile....ok
t/pod........ok
t/prereq.....ok
All tests successful.
Files=3, Tests=4, 6 wallclock secs ( 3.73 cusr + 0.48 csys = 4.21 CPU)
Now that I have all of the infrastructure in place, I want to further develop the program. Since I'm treating it as a module, I want to add additional subroutines that I can call when I want it to do the work. These subroutines should be small and easy to test. I might even be able to reuse these subroutines by simply including my modulino in another program. It's just a module, after all, so why shouldn't other programs use it?
First, I move away from a hard-coded message. I'll do this in baby steps to illustrate the development of the modulino, and the
first thing I'll do is move the actual message to its own subroutine. That hides the message to print behind an interface, and
later I'll change how I get the message without having to change the run subroutine. I'll also be
able to test message separately. At the same time, I'll put the entire program in its own package,
which I'll call Japh. That helps compartmentalize anything I do when I want to test the modulino
or use it in another program:
#!/usr/bin/perl
package Japh;
run() unless caller();
sub run {
print message(), "\n";
}
sub message {
'Just another Perl hacker, ';
}
I can add another test file to the t/ directory now. My first test is simple. I check that I can use the modulino and that my new subroutine is there. I won't get into testing the actual message yet
since I'm about to change that[1]:
# message.t
use Test::More tests => 4;
use_ok( 'Japh.pm' );
ok( defined &message );
Now I want to be able to configure the message. At the moment it's in English, but maybe I don't always want that. How am I going to get the message in other languages? I could do all sorts of fancy internationalization things, but for simplicity I'll create a file that contains the language, the template string for that language, and the locales for that language. Here's a configuration file that maps the locales to a template string for that language:
en_US "Just another %s hacker, "
eu_ES "apenas otro hacker del %s, "
fr_FR "juste un autre hacker de %s, "
de_DE "gerade ein anderer %s Hacker, "
it_IT "appena un altro hacker del %s, "
I add some bits to read the language file. I need to add a subroutine to read the file and return a data structure based on the
information, and my message routine has to pick the correct template. Since message is now returning a template string, I need run to use sprintf instead. I also add another subroutine, topic, to return the
type of hacker I am. I won't branch out into the various ways I can get the topic, although you can see how I'm moving the program
away from doing (or saying) one thing to making it much more flexible:
sub run
{
my $template = get_template();
print message( $template ), "\n";
}
sub message
{
my $template = shift;
return sprintf $template, get_topic();
}
sub get_topic { 'Perl' }
sub get_template { ... shown later ... }
I can add some tests to ensure that my new subroutines still work and also check that the previous tests still work.
Being quite pleased with myself that my modulino now works in many languages and that the message is configurable, I'm
disappointed to find out that I've just introduced a possible problem. Since the user can decide the format string, he can do
anything that printf allows him to do[2], and that's quite a bit. I'm
using user-defined data to run the program, so I should really turn on taint checking (see Chapter 3), but even better than that, I
should get away from the problem rather than trying to put a bandage on it.
Instead of printf, I'll use the Template module. My format strings will turn into templates:
en_US "Just another [% topic %] hacker, "
eu_ES "apenas otro hacker del [% topic %], "
fr_FR "juste un autre hacker de [% topic %], "
de_DE "gerade ein anderer [% topic %] Hacker, "
it_IT "Solo un altro hacker del [% topic %], "
Inside my modulino, I'll include the Template module
and configure the Template parser so it doesn't
evaluate Perl code. I only need to change message because nothing else needs to know how message does its work:
sub message {
my $template = shift;
require Template;
my $tt = Template->new(
INCLUDE_PATH => '',
INTERPOLATE => 0,
EVAL_PERL => 0,
);
$tt->process( \$template, { topic => get_topic() }, \ my $cooked );
return $cooked;
}
Now I have a bit of work to do on the distribution side. My modulino now depends on Template so I need to add that to the list of prerequisites. This way, CPAN (or CPANPLUS) will automatically detect the dependency and install it as it installs my modulino. That's just another benefit of wrapping the program in a distribution:
WriteMakefile(
...
'PREREQ_PM' => {
Template => '0';
},
...
);
What happens if there is no configuration file, though? My message subroutine should still do
something, so I give it a default message from get_template, but I also issue a warning if I have
warnings enabled:
sub get_template {
my $default = "Just another [% topic %] hacker, ";
my $file = "t/config.txt";
unless( open my( $fh ), "<", $file ) {
carp "Could not open '$file'";
return $default;
}
my $locale = shift || 'en_US';
while( <$fh> )
{
chomp;
my( $this_locale, $template ) = m/(\S+)\s+"(.*?)"/g;
return $template if $this_locale eq $locale;
}
return $default;
}
You know the drill by now: the new additions to the program require more tests. Again, I'll leave that up to you.
Finally, I need to test the whole thing as a program. I've tested the bits and pieces individually, but do they all work
together? To find out, I use the Test::Output module
to run an external command and capture the output. I'll compare that with what I expect. How I do this for programs depends on what
the particular program is supposed to actually do. To run my program inside the test file, I wrap it in a subroutine and use the
value of $^X for the perl binary I should use. That will be the
same perl binary that's running the tests
#!/usr/bin/perl
use File::Spec;
use Test::More 'no_plan';
use Test::Output;
my $script = File::Spec->catfile( qw(blib script Japh.pm ) );
sub run_program {
print `$^X $script`;
}
{ # test for US English
local %ENV;
$ENV{LANG} = 'en_US';
stdout_is( \&run_program, "Just another Perl hacker, \n" );
}
{ # test for Spanish
local %ENV;
$ENV{LANG} = 'eu_ES';
stdout_is( \&run_program, "apenas otro hacker del Perl, \n" );
}
{ # test with no LANG setting
local %ENV;
delete $ENV{LANG};
stdout_is( \&run_program, "Just another Perl hacker, \n" );
}
{ # test with nonsense LANG setting
local %ENV;
$ENV{LANG} = 'blah blah';
stdout_is( \&run_program, "Just another Perl hacker, \n" );
}
Once I create the program distribution, I can upload it to CPAN (or anywhere else that I like) so other people can download it.
To create the archive, I do the same thing I do for modules. First, I run make disttest, which
creates a distribution, unwraps it in a new directory, and runs the tests. That ensures that the archive I give out has the
necessary files and everything runs properly (well, most of the time):
$ make disttest
After that, I create the archive in which ever format that I like:
$ make tardist
==OR==
$ make zipdist
Finally, I upload it to PAUSE and announce it to the world. In real life, however, I use my release utility that comes with Module::Release and this (and much more) all happens in one step.
As a module living on CPAN, my modulino is a candidate for CPAN Testers, the loosely connected group of volunteers and automated computers that test just about every module. They don't test programs, but our modulino doesn't look like a program.
There is a little known area of CPAN called "scripts" where people have uploaded stand-alone programs without the full distribution support[3]. Kurt Starsinic did some work on it to automatically index the programs by category, and his solution simply looks in the program's Pod documentation for a section called "SCRIPT CATEGORIES"[4]. If I wanted, I could add my own categories to that section, and the programs archive should automatically index those on its next pass.
=pod SCRIPT CATEGORIES
CPAN/Administrative
=cut
I can create programs that look like modules. The entire program (outside of third-party modules) exists in a single file. Although it runs just like any other program, I can develop and test it just like a module. I get all the benefits of both forms, including testability, dependency handling, and installation. Since my program is a module, I can easily re-use parts of it in other programs, too.
"How a Script Becomes a Module" originally appeared on Perlmonks: http://www.perlmonks.org/index.pl?node_id=396759.
I also wrote about this idea for The Perl Journal in "Scripts as Modules". Although it's the same idea, I chose a completely different topic: turning the RSS feed from TPJ into HTML: http://www.ddj.com/dept/lightlang/184416165.
Denis Kosykh wrote "Test-Driven Development" for The Perl Review 1.0 (Summer 2004): http://www.theperlreview.com/Issues/subscribers.html.