This is a modified chapter of Mastering Perl by brian d foy and published by O'Reilly Media. It may differ significantly from that in the book due to corrections, additions of new material, or removal of obsolete material. This material is online for community review in preparation for a second edition.
This work is copyrighted under a contract between O'Reilly Media and brian d foy, and you cannot repost it or distribute it without permission.
Before I can do anything to improve my programs, I have to make a decision about what I am going to fix. Before I spend the time to do that, I want to figure out what I should focus on. How do I get the most improvement for the least amount of fiddling? What should I work on first? Through the process of profiling, by which I record and summarize what a program is doing, I can make those decisions. Luckily, Perl already offers several tools to do this.
I want to compute a factorial. It's the old saw of performance discussions, and I'll get to something more interesting in a moment. When I googled for "factorial subroutines", almost every implementation (aside from those in assembly language) was a recursive algorithm, meaning that a subroutine had to figure out part of the problem, then call itself with a subproblem, and keep doing that until there are no more subproblems, eventually working its way up to the original call. Here's how I'd write that in Perl:
#!/usr/bin/perl
# factorial-recurse.pl
sub factorial
{
return unless int( $_[0] ) == $_[0];
return 1 if $_[0] == 1;
return $_[0] * factorial( $_[0] - 1 );
}
print factorial( $ARGV[0] ), "\n";
Now I want to figure out how to improve this toy program. It's already pretty fast because Perl can't really count that high.
With anything over 170, my program on my machine returns Inf (more on that in a moment). Despite
that, I'll profile it anyway. I use the Devel::SmallProf module to get a quick summary. I invoke it with the -d switch, which already assumes the Devel portion of the name (see
Chapter 4):
% perl -d:SmallProf factorial.pl 170
The Devel::SmallProf module leaves behind a
human-readable text file named smallprof.out. In its columnar output, it shows each line of the
program, how many times I executed that line, and the real and CPU times for each line:
================ SmallProf version 1.15 ================
Profile of factorial.pl Page 1
=================================================================
count wall tm cpu time line
0 0.000000 0.000000 1:#!/usr/bin/perl
0 0.000000 0.000000 2:
170 0.000000 0.000000 3:sub factorial {
170 0.001451 0.000000 4: return unless int( $_[0] ) == $_[0];
170 0.004367 0.000000 5: return 1 if $_[0] == 1;
169 0.004371 0.000000 6: return $_[0] * factorial( $_[0] - 1 );
0 0.000000 0.000000 7: }
0 0.000000 0.000000 8:
1 0.000009 0.000000 9:print factorial( $ARGV[0] ), "\n";
To compute the factorial of 170, I had to call the subroutine 170 times. Each time (save for one!) I called that subroutine, I
had to execute the lines in the subroutine. I had to check that the argument was an integer every time, I had to check if the
argument was 1 every time, and in almost every case, I had to call the subroutine again. That's a
lot of work. By profiling my program, I can see what is taking up all the time, and then concentrate on improving those areas.
The best way to fix these problems is to come up with a better way to get the answer. Better algorithms get you better
performance than almost any other method. Instead of using a recursive solution, I changed it to an iterative one. I can easily get
the range of integers using the range operator, and in other languages, a C style for loop can
stand in:
#!/usr/bin/perl
# factorial-iterate.pl
sub factorial {
return unless int( $_[0] ) == $_[0];
my $f = 1;
foreach ( 2 .. $_[0] ) { $f *= $_ };
$f;
}
print factorial( $ARGV[0] ), "\n";
When I profile this program, I see that I did not have to do as much work. I didn't have as much code to run. I only had to
check the argument once, I didn't have to check if the argument was 1, and I don't have to make
repeated calls to a subroutine:
================ SmallProf version 1.15 ================
Profile of factorial2.pl Page 1
=================================================================
count wall tm cpu time line
0 0.000000 0.000000 1:#!/usr/bin/perl
0 0.000000 0.000000 2:
1 0.000000 0.000000 3:sub factorial {
1 0.000021 0.000000 4: return unless int( $_[0] ) == $_[0];
1 0.000000 0.000000 5: my $f = 1;
170 0.001632 0.000000 6: foreach ( 2 .. $_[0] ) { $f *= $_ };
1 0.002697 0.000000 7: $f;
0 0.000000 0.000000 8: }
0 0.000000 0.000000 9:
1 0.000006 0.000000 10:print factorial( $ARGV[0] ), "\n";
Earlier I said that my program topped out at 170. I can get past that limit by telling Perl to use the bignum pragma:
#!/usr/bin/perl
# factorial-recurse-bignum.pl
use bignum;
sub factorial {
return unless int( $_[0] ) == $_[0];
return 1 if $_[0] == 1;
return $_[0] * factorial( $_[0] - 1 );
}
print factorial( $ARGV[0] ), "\n";
Now I can see some real performance differences by comparing the factorials of really big numbers. As I was finishing this book, I switched to a MacBook Pro and its dual core architecture had no problem with speed in either of the approaches. Only with really large numbers did the recursive approach really slow down.
That's not the whole story, though. I've shown a really simple program that calculates a single number. In a real program I
would most likely use the factorial routine many, many times with several different values. When I
profile the application, I'll see the number of times I run the lines of the subroutine throughout the entire process.
Either approach can benefit from caching its results. Here's a program that repeatedly prompts me for a number. It computes the factorial and caches the results along the way, trading memory for speed. The first time I ask it to compute the factorial for 10,000, it takes several seconds. After that, when I ask it for the factorial for any number less than 10,000 it's just a very fast lookup:
#!/usr/bin/perl
# factorial-iterate-bignum-memo.pl
use bignum;
{
my @Memo = (1);
sub factorial {
my $number = shift;
return unless int( $number ) == $number;
return $Memo[$number] if $Memo[$number];
foreach ( @Memo .. $number )
{
$Memo[$_] = $Memo[$_ - 1] * $_;
}
$Memo[ $number ];
}
}
{
print "Enter a number> ";
chomp( my $number = <STDIN> );
exit unless defined $number;
print factorial( $number ), "\n";
redo;
}
I can do the same with the recursive solution, although the Memoize module does the extra work for me:
#!/usr/bin/perl
# factorial-recurse-bignum-memo.pl
use bignum;
use Memoize;
memoize( factorial );
sub factorial {
return unless int( $_[0] ) == $_[0];
return 1 if $_[0] == 1;
return $_[0] * factorial( $_[0] - 1 );
}
{
print "Enter a number> ";
chomp( my $number = <STDIN> );
exit unless defined $number;
print factorial( $number ), "\n";
redo;
}
While profiling, I must remember that some things in isolation don't tell me the whole story. The profile can help me make decisions, but I'm the one who has to do the thinking, not the computer.
Profiling means counting, and to count something, I need to make the statements do something so I can count them. I might, for instance, use some of the features from Chapter 4 to add accounting code to my subroutines. That's much too tedious, though. Instead of trying to account for things in individual subroutines, I try to make everything flow through a single control subroutine. This is probably too much for a small program, but in a large system the extra computing pays off in saved developer time when I work on optimizing the program.
The most common place I do this is in database code. In the database case, I want to track with queries I make, usually so I can get an idea of which queries take a long time or which ones I most frequently use. From that, I can figure out what I should optimize.
Here's an example of a nexus for all queries that allows me to profile my database code. I've simplified this example, but this
is close to some actual code I've used, minus some stuff that doesn't apply to profiling. I have a package-scoped lexical variable
%Queries that will hold my profile data. The simple_query method
is essentially a wrapper around prepare and execute with some
accounting overhead:
package My::Database;
my %Queries;
sub simple_query
{
my( $self, @args ) = @_;
my $sql_statement = shift @args;
$Queries{$sql_statement}++; # <--- Profiling hook
my $sth = $self->dbh->prepare( $sql_statement );
unless( ref $sth ) { warn $@; return }
my $rc = $sth->execute( @args );
wantarray ? ( $sth, $rc ) : $rc;
}
In the rest of my database code, I have functions that use simple_query instead of using the DBI interface directly. My get_postage_rates_by_country grabs the amount of postage I need to send mail overseas. It passes the SQL
statement and a bind parameter to simple_query. As before, this is real code, although I've cut
out some bits to only show the relevant parts:
sub get_postage_rates_by_country
{
my( $self, $country ) = @_;
my( $sth ) = $self->simple_query( <<"SQL", $country );
SELECT
PostageRates.ounces,
PostageRates.rate,
PostageServices.name
FROM
PostageRates, Countries, PostageServices
WHERE
Countries.pk = ?
AND
Countries.usps_zone = PostageRates.usps_zone
AND
PostageRates.service = PostageServices.pk
ORDER BY
PostageRates.ounces
SQL
return $sth->fetchall_arrayref;
}
As my program does its work, the queries flow through simple_query, which counts and records
what happens. To get the profile data, I use an END block to create the report. The particular
format depends on what I collected during the run. In this example I just counted statements, but I could use that %Queries hash to store anything I wanted, including the bind parameters, the function that called simple_query, and so on:
END {
foreach my $statement ( sort { $b <=> $a } keys %Queries )
{
printf "%5d %s\n\n", $Queries{$statement}, $statement;
}
}
I might find, for instance, that in a long report that I repeatedly fetch the postage data for each country, even though it's not going to change. When I realize I'm doing this after looking at the profile data, I can optimize my code to cache some of the data in memory rather than asking for the same answer in the database.
I've actually been coding my Perl database stuff like this for quite a while, and I recently found out that Tim Bunce added these features directly to DBI. He did the same sort of thing by making everything flow through a central function. That was really easy because DBI already does that for queries.
The DBI::Profile module can do much of the same
work but for my use of Perl's database interface module, DBI. Database interactions are often the
biggest performance drain on my programs, and that's a place I usually start to look for improvements. Instead of calling
subroutines unnecessarily, as in my last example, I might be making unnecessary database queries.
Here's a short program that takes quite a bit of time because it makes almost 2,000 database queries. I want to build a table of
number names, so given a digit I can get the name (e.g. 9 has the name "Nine"), or go from the name to the digit. I should probably
use a Lingua::* module, but then I don't want to start off with something smart. In this example,
I use the DBD::CSV module to use a comma-separated value
file as my database store. I create a table to hold the pairs, then start to populate the table. I bootstrap the data by getting
the first 19 names into the table, then looking up the names I already have to create further names:
#!/usr/bin/perl
# dbi-number-inserter.pl
use strict;
use DBI;
my $dbh = DBI->connect( "DBI:CSV:f_dir=." );
$dbh->do( "DROP TABLE names" );
$dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" );
my $sth = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" );
my $id = 1;
foreach my $name (
qw(One Two Three Four Five Six Seven Eight Nine Ten),
qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen
Nineteen)
)
{
$sth->execute( $id++, $name );
}
foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) )
{
$sth->execute( $id++, $name );
foreach my $ones_digit ( 1 .. 9 )
{
my( $ones_name ) = map { lc } $dbh->selectrow_array(
"SELECT name FROM names WHERE id = $ones_digit"
);
$sth->execute( $id++, "$name $ones_name" );
}
}
foreach my $digit ( 1 .. 9 )
{
my( $hundreds ) = $dbh->selectrow_array(
"SELECT name FROM names WHERE id = $digit"
);
$sth->execute( $id++, "$hundreds hundred" );
foreach my $tens_digit ( 1 .. 99 )
{
my( $tens_name ) = map { lc } $dbh->selectrow_array(
"SELECT name FROM names WHERE id = $tens_digit"
);
$sth->execute( $id++, "$hundreds hundred $tens_name" );
}
}
I run this from the command line, and it takes almost two minutes on my Powerbook G4. That's okay; I need a nice, slow example.
Now I want to profile this program to see where I can improve it, pretending I was just handed it without knowing how it works. I
set the DBI_PROFILE environment variable to turn on database profiling[1]. To get a report ordered by statements, I set DBI_PROFILE='!Statement'.
The sort key has an exclamation point, !, prepended to it. At the end of the run, I get a long
report. Here are the first several lines:
$ env DBI_PROFILE='!Statement' perl dbi-profile.pl
DBI::Profile: 109.671362s 99.70% (1986 calls) dbi-profile.pl @ 2006-10-10 02:18:40
'' =>
0.000784s / 10 = 0.000078s avg (first 0.000023s, min 0.000001s, max 0.000618s)
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
0.004258s
'DROP TABLE names' =>
0.008017s
'INSERT INTO names VALUES ( ?, ? )' =>
3.229462s / 1002 = 0.003223s avg (first 0.001767s, min 0.000037s, max 0.108636s)
'SELECT name FROM names WHERE id = 1' =>
1.204614s / 18 = 0.066923s avg (first 0.012831s, min 0.010301s, max 0.274951s)
'SELECT name FROM names WHERE id = 10' =>
1.118565s / 9 = 0.124285s avg (first 0.027711s, min 0.027711s, max 0.341782s)
'SELECT name FROM names WHERE id = 11' =>
1.136748s / 9 = 0.126305s avg (first 0.032328s, min 0.032328s, max 0.378916s)
The top line gives me the wallclock time and the total number of DBI method calls; that's the number of method calls to DBI, not the number of queries. After that, I get a report for each query, in lexical order. Just because it looks like it's sorted by total time or number of queries, don't forget to look at the rest of the report. It's actually sorted in alphabetical order of the query.
For each query, DBI::Profile reports the total wallclock time and the number of method calls for that statement. It doesn't report the CPU time because it isn't very interesting; the database server might be another machine, and even if it is local, it's often a separate process. It gives an average time for that query, and then the times for the first call, the call that took the least amount of time, and the call that took the most. This isn't as simple as timing a program. The database server might perform differently given the same input because it might be doing something else, the data size might be different, or many other things.
From the full report, I see that most calls took about the same amount of time since they are all running pretty quickly, so I can't make a big speed-up by optimizing a query so it performs better on the database. No indexing or re-arrangement of joins will likely help here.
What I really need to reduce is the number of queries so I interact with the database less. I can't get away from the INSERTs since I still have to make each row, but I don't need to make all of those select statements. I
should cache the result so I don't fetch the same data twice (or even at all):
#!/usr/bin/perl
# dbi-number-inserter-cached.pl
use strict;
use DBI;
my $dbh = DBI->connect( "DBI:CSV:f_dir=." );
$dbh->do( "DROP TABLE names" );
$dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" );
my $insert = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" );
my @array = ( qw( Zero ),
qw(One Two Three Four Five Six Seven Eight Nine Ten),
qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen
Nineteen)
);
my $id = 0;
foreach my $name ( @array )
{
$insert->execute( $id++, $name );
}
foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) )
{
$array[ $id ] = $name;
$insert->execute( $id++, $name );
foreach my $ones_digit ( 1 .. 9 )
{
my $full_name = $array[ $id ] = "$name $array[$ones_digit]";
$insert->execute( $id++, $full_name );
}
}
foreach my $digit ( 1 .. 9 )
{
my( $hundreds ) = $array[ $digit ];
my $name = $array[$id] = "$hundreds hundred";
$insert->execute( $id++, $name );
foreach my $tens_digit ( 1 .. 99 )
{
my( $tens_name ) = lc $array[ $tens_digit ];
$array[$id] = "$hundreds hundred $tens_name";
$insert->execute( $id++, "$name $tens_name" );
}
}
In my first pass at improvement, I don't have any select statements at all because I cache the results. That cuts out most of the runtime in this program. The times for each program are remarkably different. Remember, however, that I've made a trade-off between speed and memory. The second program is faster, but it takes up more memory:
$ time perl dbi-profile.pl
real 1m48.676s
user 1m21.136s
sys 0m1.698s
$ time perl dbi-profile2.pl
real 0m2.638s
user 0m1.736s
sys 0m0.307s
Here's the entire profile report for my new program, which now runs in two percent of the original runtime. Most of the calls
are INSERTs:
$ env DBI_PROFILE='!Statement' perl dbi-profile2.pl
DBI::Profile: 2.118577s 105.93% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:31:10
'' =>
0.000757s / 10 = 0.000076s avg (first 0.000021s, min 0.000001s, max 0.000584s)
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
0.004216s
'DROP TABLE names' =>
0.006906s
'INSERT INTO names VALUES ( ?, ? )' =>
2.106698s / 1003 = 0.002100s avg (first 0.001713s, min 0.000037s, max 0.005587s)
By looking at the profile, I was able to target part of the program for improvement. It didn't tell me how to improve it, but at least I know where I should spend my time.
The runtime report isn't the only one I can get. With DBI_PROFILE='!MethodName', DBI orders the report according to the name of the DBI function. It's in ASCII-betical order with the
uppercase letters sorting before the lowercase ones (and I've redacted part of these reports since they shows all of the
methods, including the ones I didn't even know I was using):
$ env DBI_PROFILE='!MethodName' perl dbi-profile2.pl
DBI::Profile: 2.168271s 72.28% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:37:16
'DESTROY' =>
0.000141s / 2 = 0.000070s avg (first 0.000040s, min 0.000040s, max 0.000101s)
'FETCH' =>
0.000001s
'STORE' =>
0.000067s / 5 = 0.000013s avg (first 0.000022s, min 0.000006s, max 0.000022s)
'do' =>
0.010498s / 2 = 0.005249s avg (first 0.006602s, min 0.003896s, max 0.006602s)
'execute' =>
2.155318s / 1000 = 0.002155s avg (first 0.002481s, min 0.001777s, max 0.007023s)
'prepare' =>
0.001570s
I can even combine the two since DBI::Profile can
deal with multiple sort keys if I join them with a colon. With DBI_PROFILE='!Statement:!MethodName', DBI gives me a double layer report. Under each SQL statement, it breaks the time down by
the particular function it used. I might, for instance, want to compare the time my database query spends in the DBI guts and actually fetching the data:
$ env DBI_PROFILE='!Statement:!MethodName' perl dbi-profile2.pl
DBI::Profile: 2.123325s 106.17% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:38:22
'' =>
'FETCH' =>
0.000001s
'STORE' =>
0.000069s / 5 = 0.000014s avg (first 0.000024s, min 0.000005s, max 0.000024s)
'connect' =>
0.000644s
'default_user' =>
0.000030s
'disconnect' =>
0.000050s
'disconnect_all' =>
0.000024s
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
'do' =>
0.004616s
'DROP TABLE names' =>
'do' =>
0.007191s
'INSERT INTO names VALUES ( ?, ? )' =>
'DESTROY' =>
0.000149s / 2 = 0.000075s avg (first 0.000050s, min 0.000050s, max 0.000099s)
'execute' =>
2.108945s / 1000 = 0.002109s avg (first 0.002713s, min 0.001796s, max 0.005454s)
'prepare' =>
0.001606s
I can flip that last report around by using DBI_PROFILE='!MethodName:!Statement'. The first
layer lists the DBI method and then breaks it down by SQL
statements after that:
$ env DBI_PROFILE='!MethodName:!Statement' perl dbi-profile2.pl
DBI::Profile: 2.431843s 81.06% (1015 calls) dbi-profile2.pl @ 2006-10-10 02:40:40
'DESTROY' =>
'INSERT INTO names VALUES ( ?, ? )' =>
0.000142s / 2 = 0.000071s avg (first 0.000039s, min 0.000039s, max 0.000103s)
'FETCH' =>
'' =>
0.000001s
'STORE' =>
'' =>
0.000065s / 5 = 0.000013s avg (first 0.000022s, min 0.000005s, max 0.000022s)
'connect' =>
'' =>
0.000685s
'default_user' =>
'' =>
0.000024s
'disconnect' =>
'' =>
0.000050s
'disconnect_all' =>
'' =>
0.000023s
'do' =>
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
0.004287s
'DROP TABLE names' =>
0.006389s
'execute' =>
'INSERT INTO names VALUES ( ?, ? )' =>
2.418587s / 1000 = 0.002419s avg (first 0.002549s, min 0.001819s, max 0.013104s)
'prepare' =>
'INSERT INTO names VALUES ( ?, ? )' =>
0.001589s
Sam Tregar's DBI::ProfileDumper module does the same thing as DBI::Profile, but it saves its result in a file instead of dumping it to standard output[2]. By default, this file is named dbi.prof, but I can use any name I like. For anything but a small application, I'll probably have to do quite a bit of custom slicing and dicing to extract the information I need.
First, I tell DBI which profiling class it should use by including it in the DBI_PROFILE value.
I join the class name to the profiling sort keys with a /:
$ env DBI_PROFILE='!Statement'/DBI::ProfileDumper ./program.pl
Once that command completes, dbi.prof has all of the profiling data. If I want to change the filename, I
just add that to DBI_PROFILE by appending it after the class name:
$ env DBI_PROFILE='!Statement'/DBI::ProfileDumper/File:dbi.prof ./program.pl
Once I have the data, I can analyze them with dbiprof, which has several options to select the
data to display, sort it in the way that I want (even on multiple keys), and many other things.
$ dbiprof --number all --sort longest
I started with a pretty bad program that made many unnecessary calls to the database and did quite a bit of work. I can make
that program more Perly, though, by using Perl's list operators smartly. Instead of all that index counting, I use push to put things onto an array. The code is much tighter and shorter, and it does the same thing.
Instead of inserting items as I go along, I move all the database stuff to the end (I have secret plans for that later), but for
now the program runs in about the same time as the previous example:
#!/usr/bin/perl
# dbi-number-inserter-end.pl
use strict;
use DBI;
my @array = ( qw( Zero ),
qw(One Two Three Four Five Six Seven Eight Nine Ten),
qw(Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen
Nineteen)
);
foreach my $name ( qw( Twenty Thirty Forty Fifty Sixty Seventy Eighty Ninety ) )
{
push @array, $name;
push @array, map { "$name $array[$_]" } 1 .. 9
}
foreach my $digit ( 1 .. 9 )
{
push @array, "$array[$digit] hundred";
push @array, map { "$array[$digit] hundred $array[$_]" } 1 .. 99;
}
my $dbh = DBI->connect( "DBI:CSV:f_dir=." );
$dbh->do( "DROP TABLE names" );
$dbh->do( "CREATE TABLE names ( id INTEGER, name CHAR(64) )" );
my $insert = $dbh->prepare( "INSERT INTO names VALUES ( ?, ? )" );
foreach my $index ( 0 .. $#array )
{
$insert->execute( $index, $array[$index] );
}
Instead of using a CSV file though, I now want to use a more sophisticated database server since I think I might be able to get better performance in writing all of this stuff to disk. I have the tools to find out, so why not? I'll use SQLite, another lightweight server that DBI can talk to. I don't have to change too much in my program since DBI hides all of that for me. I only change the DBI connection:
# dbi-number-inserter-sqlite.pl
my $dbh = DBI->connect( "DBI:SQLite:dbname=names.sqlite.db" );
When I run my program again, it's abysmally slow. It takes a lot longer to insert all of these rows into an SQLite database store:
$ time perl dbi-profile-sqlite.pl
real 5m7.038s
user 0m0.572s
sys 0m8.220s
That's awful! When I profile the program, I see that the INSERT statements take 100 times
longer than my previous programs. Outrageous!
% DBI_PROFILE='!Statement' perl dbi-profile-sqlite.pl
DBI::Profile: 266.582027s 99.63% (1015 calls) dbi-profile-sqlite.pl @ 2006-03-22 17:19:51
'' =>
0.004483s / 10 = 0.000448s avg (first 0.000007s, min 0.000003s, max 0.004329s)
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
0.413145s
'DROP TABLE names' =>
0.294514s
'INSERT INTO names VALUES ( ?, ? )' =>
265.869885s / 1003 = 0.265075s avg (first 0.000306s, min 0.000016s, max 0.771342s)
But this is a well-known issue with SQLite and some other databases because they automatically commit each query and wait for
the data to make it to the physical disk before moving on. Instead of inserting every row individually, I can do that in a
transaction. I don't have to actually write to the database for every INSERT. I'll do it all at
once when I COMMIT:
# dbi-profile-sqlite-transaction.pl
$dbh->do( "BEGIN TRANSACTION" );
foreach my $index ( 0 .. $#array )
{
$insert->execute( $index, $array[$index] );
}
$dbh->do( "COMMIT" );
Now the profile looks much different. Looking at the results, I can see that I've improved the insertion time by orders of magnitude, and now it's faster, by all measures, than any of the previous programs that did the same thing:
% DBI_PROFILE='!Statement' perl dbi-profile-sqlite2.pl
DBI::Profile: 1.334367s 54.19% (1016 calls) dbi-profile-sqlite2.pl @ 2006-03-22 17:25:44
'' =>
0.004087s / 9 = 0.000454s avg (first 0.000007s, min 0.000003s, max 0.003950s)
'BEGIN TRANSACTION' =>
0.000257s
'COMMIT' =>
0.255082s / 2 = 0.127541s avg (first 0.254737s, min 0.000345s, max 0.254737s)
'CREATE TABLE names ( id INTEGER, name CHAR(64) )' =>
0.271928s
'DROP TABLE names' =>
0.715443s
'INSERT INTO names VALUES ( ?, ? )' =>
0.087570s / 1002 = 0.000087s avg (first 0.000317s, min 0.000004s, max 0.003396s)
I started this chapter with Devel::SmallProf
only because I get to the results faster. Devel::DProf does much of the same thing, but stores the results in its own
format so it can do several things with them later, such as make pretty code graphs. I call it in the same way by using the -d switch.
I have a program that reads the Use.Perl[3] journal entries through its SOAP interface. I run it
with the -d switch and tell it to use Devel::DProf as the debugging module:
% perl -d:DProf journals
Once I've run the program, I have a new file named tmon.out, although I can change that with the PERL_DPROF_OUT_FILE_NAME environment
variable. This file isn't human-readable, so I need to use dprofpp to turn it into something that
I can use. The wallclock time is 53 seconds, although I spent less than a second in the CPU. I could improve parts of the program,
but the network latency and download times will still dominate the time:
$ dprofpp
LWP::Protocol::collect has 17 unstacked calls in outer
Compress::Zlib::__ANON__ has 5 unstacked calls in outer
...snipped several more lines like these...
HTTP::Message::__ANON__ has 8 unstacked calls in outer
Total Elapsed Time = 53.08383 Seconds
User+System Time = 0.943839 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
8.37 0.079 0.000 84 0.0009 0.0000 utf8::SWASHNEW
6.25 0.059 0.146 5 0.0118 0.0292 main::BEGIN
5.83 0.055 0.073 24 0.0023 0.0030 Text::Reform::form
5.09 0.048 0.067 2 0.0242 0.0334 HTTP::Cookies::Netscape::load
4.24 0.040 0.040 10 0.0040 0.0040 LWP::UserAgent::BEGIN
4.24 0.040 0.049 9 0.0044 0.0054 Text::Autoformat::BEGIN
3.71 0.035 0.035 697 0.0001 0.0001 HTTP::Headers::_header
3.18 0.030 0.030 8 0.0037 0.0037 DynaLoader::dl_load_file
3.18 0.030 0.079 2 0.0149 0.0393 Text::Template::GEN0::BEGIN
3.18 0.030 0.068 17 0.0017 0.0040 LWP::Protocol::implementor
2.65 0.025 0.045 221 0.0001 0.0002 SOAP::SOM::_traverse_tree
2.54 0.024 0.024 892 0.0000 0.0000 HTTP::Cookies::set_cookie
2.44 0.023 0.023 1060 0.0000 0.0000 Text::Reform::_debug
2.44 0.023 0.061 293 0.0001 0.0002 SOAP::SOM::_traverse
2.12 0.020 0.030 5 0.0040 0.0059 HTTP::Cookies::BEGIN
If I don't want to sort on execution time, dprofpp offers other options. With -l (lowercase L), I get the output in order of most frequently called subroutine:
$ dprofpp -l
LWP::Protocol::collect has 17 unstacked calls in outer
Compress::Zlib::__ANON__ has 5 unstacked calls in outer
... snip ...
HTTP::Message::__ANON__ has 8 unstacked calls in outer
Total Elapsed Time = 53.08383 Seconds
User+System Time = 0.943839 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
1.06 0.010 0.010 1525 0.0000 0.0000 UNIVERSAL::isa
0.21 0.002 0.002 1184 0.0000 0.0000 LWP::Debug::debug
- - -0.009 1156 - - SOAP::Data::new
2.44 0.023 0.023 1060 0.0000 0.0000 Text::Reform::_debug
2.54 0.024 0.024 892 0.0000 0.0000 HTTP::Cookies::set_cookie
- - -0.005 753 - - SOAP::Serializer::new
- - -0.014 734 - - SOAP::Serializer::__ANON__
3.71 0.035 0.035 697 0.0001 0.0001 HTTP::Headers::_header
- - 0.000 527 - 0.0000 HTTP::Message::__ANON__
0.74 0.007 0.007 439 0.0000 0.0000 UNIVERSAL::can
0.64 0.006 0.004 425 0.0000 0.0000 HTTP::Headers::__ANON__
- - -0.002 382 - - SOAP::Utils::o_lattr
- - -0.002 369 - - SOAP::Trace::__ANON__
- - -0.002 323 - - HTTP::Message::_elem
0.64 0.006 0.024 323 0.0000 0.0001 HTTP::Headers::push_header
With dprofpp's -T switch, I can see the chain of subroutine
calls. My journals program made about 25,000 subroutine calls in all (it didn't really seem that
complicated when I wrote it), and here's a short section of the output where I see the SOAP module doing its work:
$ dprofpp -T
... snip ...
SOAP::Serializer::encode_object
SOAP::Serializer::multiref_object
SOAP::Serializer::gen_id
SOAP::Serializer::__ANON__
SOAP::Serializer::new
UNIVERSAL::isa
UNIVERSAL::isa
SOAP::Serializer::encode_scalar
SOAP::XMLSchema::Serializer::xmlschemaclass
SOAP::Serializer::maptypetouri
SOAP::Serializer::encode_object
SOAP::Serializer::multiref_object
SOAP::Serializer::gen_id
SOAP::Serializer::__ANON__
SOAP::Serializer::new
... snip ...
I should also note, however, the Devel::DProf sometimes runs into problems and produces segmentation faults. In that case, I can use Devel::Profiler, which is a pure Perl replacement for Devel::DProf, although slower.
The Devel::SmallProf module from the first examples isn't really all that complicated. When I look under the hood, I don't see that much code. It's very easy, in fact, to write my own profiler[4].
I'm going to create a profiler to simply count the number of times Perl sees a certain line of code during the runtime. The Devel::SmallProf module already does that for me, but once I know how to program the basics myself, I can adapt it to just about anything that I want to do.
When I run my program with the -d switch, for each statement Perl calls the special subroutine
&DB::DB (the default Perl debugger is just a program and works in much the same way). That's the
subroutine named DB in the package DB. I can do whatever I like
in that subroutine, so I'm going to accumulate that count of the number of times I've seen that line of code:
package Devel::LineCounter;
package DB;
use strict;
use warnings;
my @counter = ();
sub DB
{
my( $file, $line ) = ( caller )[1,2];
next unless $file eq $0;
$counter[$line]++;
}
To get profiling output without changing my original program, I add an END block to my LineCounter module. That will execute somewhere near the very end of the program, although it might
execute before other END blocks:
END
{
print "\nLine summary for $0\n\n";
open FILE, $0 or die "Could not open $0\n$!";
while( <FILE> )
{
printf "%6d %s", $counter[++$count] || 0, $_;
}
}
I store my new module in the right place (i.e. Devel/LineCounter.pm somewhere in Perl's module search path), then invoke it with the -d switch:
% perl -d:LineCounter factorial.pl
The Devel::Cover module profiles my test suites to tell me how much of the code base they actually test. It counts the number of times the test suite runs a line of code, as well as keeping track of which code branches I follow. Ideally, my test should touch every line of code and exercise every possible set of conditions.
Devel::Cover comes with the cover command, which reports the coverage statistics for code. To use it, I first clear anything it
might have done before. I don't really need to clear the coverage data. I might want to add the current run data to what I've
already done, or to other coverage databases for other parts of the project.
$ cover -delete
Once I've cleared the coverage database, I run my program after loading Devel::Cover. The conventional make test
invocation uses Test::Harness to run each test
program, so I tell Test::Harness to load Devel::Cover by setting HARNESS_PERL_SWITCHES with additional information for the command line to call each test program:
$ HARNESS_PERL_SWITCHES=-MDevel::Cover make test
If I'm using Module::Build instead of ExtUtils::MakeMaker, I don't have to do so much work:
$ ./Build testcover
Just as with the other Devel:: modules, Devel::Cover watches as the code runs, and uses a lot
of black magic to decide what's going on. It stores all of this information in a directory named cover_db.
Finally, the cover command turns all of that data into something that I can understand, writing
a summary to STDOUT and creating a detailed report in coverage.html. The HTML file links to several other
HTML files, allowing me to drill down into the program. Here's a run that analyzes my HTTP::Size module:
$ cover
Reading database from /Users/brian/Dev/HTTP/Size/cover_db
---------------------------- ------ ------ ------ ------ ------ ------ ------
File stmt branch cond sub pod time total
---------------------------- ------ ------ ------ ------ ------ ------ ------
blib/lib/HTTP/Size.pm 95.5 70.0 69.2 100.0 100.0 100.0 88.4
Total 95.5 70.0 69.2 100.0 100.0 100.0 88.4
---------------------------- ------ ------ ------ ------ ------ ------ ------
The summary shows me that in my test suite I've executed 95.5% of the statements in the module. Along with that, I've only tested 70% of the possible execution paths and 69.2% of the possible combinations of conditions. That's just the summary, though. The HTML report in coverage.html tells me much more.

The HTML report has columns for each of the types of coverage that it measures, and it colors the table cells green to tell me that I have 100% coverage for that measurement on that line of code, and red to tell me that I have more testing work to do.

The coverage report for a particular file shows me how well I tested that line of code.
Before I decide how to improve my Perl program, I need to profile it to determine which sections need the most work. Perl profilers are just specialized debuggers, and if I don't like what's already out there, I can make my own profiler.
The perldebguts documentation describes creating a custom debugger. I write more about those in my articles for The Perl Journal, "Creating a Perl Debugger", http://www.ddj.com/184404522, and "Profiling in Perl" http://www.ddj.com/184404580.
"The Perl Profiler" is Chapter 20 of Programming Perl, 3rd Edition. Anyone on the road to Perl mastery should already have this book.
Perl.com has two interesting articles on profiling: "Profiling Perl", by Simon Cozens, http://www.perl.com/lpt/a/850 and "Debugging and Profiling mod_perl Applications", by Frank Wiles, http://www.perl.com/pub/a/2006/02/09/debug_mod_perl.html.
Randal L. Schwartz writes about profiling in "Speeding up Your Perl Programs" for Unix Review, http://www.stonehenge.com/merlyn/UnixReview/col49.html, and "Profiling in Template Toolkit via Overriding" for Linux Magazine, http://www.stonehenge.com/merlyn/LinuxMag/col75.html.