Die-ing on the Web
by brian d foy

Web servers aren't very friendly to misbehaving CGI scripts - they
like to complain rather than give constructive criticism, so 
the detective work is left up to the script maintainer.  Often,
the script maintainer is not the same person as the script
author, which leads to several difficulties in diagnosing
the problem.

I present four methods for dealing with uncooperative CGI
scripts. The first two, "Fiddling with die()" and
"Redirecting STDERR", show some basic techniques in
error handling which you can apply to more than just CGI
scripts.   The third method illustrates the use of the
CGI::Carp module, which does the work of the first two
methods for you.  The last method should become part of
every CGI developer's personal library - a custom built
error handling routine that not only outputs error messages
to the right places, but can also send mail to the
appropriate person or take other actions to try to fix the
problem.

A respected physicist once consoled me with "An expert is
someone who has made every mistake".  I defer on expert
status, but here's what I've learned from my mistakes. 

1.  FIDDLING WITH DIE

What happens when you need to fix a CGI script that you did
not write?  I have had plenty of clients ask me to quickly
fix their existing script so they had something in
production while I was re-coding the project.  Amazingly
enough, I have found a lot of  CGI scripts that use
die()!  Usually the problem is a failed open() which mucks
up the works before an HTTP header can make it to the
server.  This leads to server errors.

The first time I had to debug someone else's CGI script,  I
rushed off to the substitute function,

	s/die/$another_thing/g;
	
only to REALLY cause problems.  Luckily no one was around
to see this and I quickly figured out that I needed to
start over with the original file.

Now, rather than going through the code to change every instance
of die(), I simply redefine what die() does. Instead of
printing to STDERR, I can change die() to print to STDOUT and
include a minimal HTTP header.

I change what die() does by fiddling with $SIG{__DIE__} -
the signal that is sent when die() is invoked.  In this
case, I use an anonymous subroutine to print die()'s message
to STDOUT: ( the same thing works for warn() and
$SIG{__WARN__} )

    #!/usr/bin/perl
    
    # use an anonymous subroutine to replace die()'s 
    # default behaviour
    $SIG{__DIE__} = sub
            {
            my $message = shift;
    
            print STDOUT 
            "The following message is brought to you anonymously:\n";
    
            print STDOUT "$message\n";
            };
    
    print STDOUT "Content-type: text/plain\n\n";
    
    print STDOUT "This is from STDOUT\n";
    
    #well, at least my system doesn't have this file :)
    open FILE, '/etc/password' or die "This is from die: $!\n";
    
    __END__

It is a great tool if you inherit a script from a colleague,
I don't recommend this technique if you are starting a long
script from scratch.  I notice that Randal likes to use this
technique in his WebTechniques scripts [*].  That's probably
fine for short scripts, but a code reviewer might not
remember what you did to die() several hundred lines further
down. Um, not that I know this from experience or anything.
    
2.  REDIRECTING STDERR TO STDOUT

What happens if I have redefined die(), but I still
get output on STDERR?  If STDERR flushes before the server 
receives a proper HTTP header, I get more server errors and
more frustration.

I had to debug a complicated mess of CGI scripts whose
documentation seemed to be some sort of ASCII-fied cryllic
language.  The tangle of require()'s and files was quite a
mess and I couldn't read the documentation, but something
was secretly printing to STDERR.  I wanted to see the error
message so I could infer from where it was coming  by
searching for pieces of the message.  However, I was on a
VT100 terminal at the time (yes, they still exist), so
watching the server log was annoying and time-consuming. 
Running the script from the command line interwove STDOUT
and STDERR into one big HTML mess.

I decided to redirect STDERR to STDOUT so that I could see
the error message in Lynx, which would also nicely format
the HTML. There is an example in "How do I capture STDERR
from an external command" in the Perl FAQ [*], but it gets a
bit tricky since I needed to print things in a certain order
for the web server not to give me an error:

	#!/usr/bin/perl

	BEGIN 
		{
		#we want STDOUT to flush right away
		select(STDOUT); $| = 1; 
		print STDOUT "Content-type: text/plain\n\n";}
		}
		
	open STDERR, ">&STDOUT";

	print STDOUT "This is from STDOUT\n";
	print STDERR "This is from STDERR\n";

	die 'This is from die';

	__END__

Notice that I set STDOUT to autoflush.  if I don't do that, 
the STDERR handle flushes first and I get a "malformed header from
script", even though I supposedly output the HTTP header in a 
BEGIN block.


3.  USING CGI::Carp

Now that I've shown you the basics of managing fatal errors and
STDERR, you can forget about them and use CGI::Carp which
comes with the standard perl distribution.  The CGI::Carp module
easily traps fatal errors and send them elsewhere.  There are
two exportable functions that do simplistic
error handling - carpout() and fatalsToBrowser().

The carpout function allows me to redirect the output from
die, warn, croak, confess, and carp to another file handle. 
The POD suggests that I set up this redirection in a BEGIN
block so I can catch some compile time errors:

	BEGIN 
       	{
		use CGI::Carp qw(carpout);
		open(ERROR_LOG, ">>my_error_log") 
			or die("my_error_log: $!\n");
		carpout(\*ERROR_LOG);
        }

beware though!  we have now redirected STDERR to a file. 
STDERR is where perl prints the

	carpout.cgi syntax OK

message after you test your script from the command line
with

	perl -cw carpout.cgi
	
which we are all doing, right?  Well, I wasn't thinking
about this when I tried it the first time, so my screen
looked like

	dog[32] perl -cw carpout.cgi
	dog[33]

I couldn't figure out what I had done wrong since I had
never seen perl not return anything.  Furthermore, running
the script appeared equally fruitless. It's whole job was to
die() so it's output disappeared into the corn fields too.

	dog[36] ./carpout.cgi
	dog[37]

When I start playing with redirection, I usually need to
draw myself a picture to avoid such lapses of memory. 

Despite all of this, I still need to handle sending the HTTP
header to the server since carpout doesn't do that for me. 
But today is my lucky day since CGI::Carp already knows how
lazy I am.

The second method redirects fatal messages from die() or
confess() to the browser along with a minimal set of HTTP
headers so that the server won't complain.  Strangely enough
it's called fatalsToBrowser():

	#!/usr/bin/perl
	
	use CGI::Carp qw(fatalsToBrowser);
	
	open FILE, "quotes.txt"  or die("$!\n");
	
	print <<"HTTP";
	Content-type: text/plain
	
	HTTP
	
	while( <FILE> ) { print }
	
	close FILE;
	
	__END__

some HTML gets sent to my browser

	<H1>Software error:</H1>
	<CODE>No such file or directory
	</CODE>
	<P>
	Please send mail to this site's webmaster for help.

and the error message is neatly recorded in the error log
with the time stamp and source filename.
	
	[Sat Oct 18 04:20:48 1997] fatalsToBrowser.cgi: No such file or directory

4.  ROLLING YOUR OWN

By far the best solution for large projects is to make a
custom error handling routine. Instead of using die(), I
have my own routine which I export from a module: 
cgi_error(). I can do various clean up sorts of things as
well as making sure that I get an intelligent message in the
browser if something went wrong.  I can even send myself
nasty little email messages:

sub cgi_error
	{
	my $message = shift;
	
	print <<"HTTP";
Content-type: text/plain

there was an error:

$message
HTTP

	open MAIL, '| /usr/lib/sendmail -t -odq -oi';
	
	print MAIL <<"MESSAGE";
To: brian@sri.net
From: jimminy_cricket@sri.net
Subject: Your groovy CGI, baby.

something horrible has happened:
$message
MESSAGE

	close MAIL;
	}


I can add even more to cgi_error() to collect most of the
data that I need to diagnose the problem, and maybe even fix
it.  This was especially handy with one script that needed
to be setuid.  If it wasn't setuid it would eternally loop
trying to get a record lock on a database to which the
server UID did not permission to write.  The script did not
seem to break if it didn't have permission; there were no
server errors or other indications of failure.  It just ate
as much processing time as it could.  It looked like a
very slow connection from the browser's perspective. 
Adding a cgi_error() fixed the problem - if the script could not
get the database lock after so many tries, it called
cgi_error() which diagnosed a few things: is the process
that has the lock still running?  If not, cgi_error can call
a lock_smith script to release the zombie lock.  Does the
script have the right permissions?  If not, cgi_error can
call a program to restore the proper permissions to the
script. Once cgi_error has tried to fix the problem, it
collects some information and sends me a message saying what
it tried to do and what actually happened.  

I have found this to be a far superior solution to listening
to clients say "I don't know - it's broken" or to getting up
before three in the afternoon to diagnose it by hand.  Once
I developed a cgi_error function with which I was satisfied,
I found that my time spent in the debugging phase of a 
project was greatly reduced.

IN SUMMARY

I presented several techniques to deal with error handling
in CGI scripts - some specific to the CGI environment and
others that can easily be adapted to other situations.
I highly encourage any CGI developer, especially those
that distribute their software, to develop robust and sensible
methods of error handling.

REFERENCES

The Perl FAQ, Section 8, System Interaction,
<URL:http://www.perl.com/>

Randal Schwartz's WebTechnique columns,
<URL:http://www.stonehenge.com/merlyn/WebTechniques/>

__END__