Creating Perl Code Graphs

by brian d foy for Dr. Dobbs Journal

In my last article I showed you how to profile your code using the Devel::SmallProf module which gave you times and counts per line of code. In this article we will go further than that by making code graphs with that information by using the Devel::GraphVizProf module.

A graph, in this sense, is a collection of connected nodes. The nodes, in code graphs, are the executable statements of the program and are connected by "edges" which show the flow of code from one statement to the next. This sort of graph is also know as a "directed graph" since the edges show the direction of flow from node to node.

GraphViz is an open-source graphing program developed by AT&T which can help developers visualize structural information, such as code flow, database table relationships, or the links between web pages. GraphViz and many other interesting tools are provided free of charge by AT&T. The ease of installation of this package can depend on your operating system. On FreeBSD simply go to /usr/ports/graphics/graphviz and run make install then go off for a cup of coffee. Installing GraphViz is a bit more involved on Red Hat Linux due to some incompatibilities mentioned on the Graph Visualization Project development site. There appears to be an initial version for Windows but I have not tried it. Programmer beware!

You can get Devel::GraphVizProf from the Comprehensive Perl Archive Network. It is in the GraphViz module distribution by Leon Brocard who also presented a talk about Perl code graphs at YAPC::Europe 2000. It does not install automatically as of version 0.12 but all that you need to do is copy the Devel directory to an appropriate Perl library directory. If you have not done this before or cannot install modules into the Perl library directories, perlfaq8 can help you figure out what to do. Although you may suffer a bit more while installing this module, the coolness factor is worth the pain.

GraphViz can do quite a bit and comes with more tools than I will show, but you can see the documentation for more details. To show a simple code graph, I wrote a sample program that does not do anything useful,

	
	#!/usr/bin/perl

	my $test = 0;

	while( $test++ < 15 )
	        {
	        my_print("Hello $test\n");
	        }

	sub my_print
	        {
	        print $_[0];
	        }

and then I wrote the graph description of it. Each executable statement is defined as a node and the edges are defined as connections between them. In this case, I connect statements that follow each other during program execution. Rather than discuss the dot syntax here I refer you to the dot documentation so I can get on with the cool stuff. Later the Devel::GraphVizProf module will do all of this for me.

	
	digraph test {
	        bgcolor="white";
	        node2 [color="0,1,0", label="my $test = 0;"];
	        node4 [color="0,1,0", label="my_print(\"Hello $test\n\");"];
	        node3 [color="0,1,0", label="print $_[0];"];
	        node1 [color="0,1,0", label="while( $test++ < 15 )"];
	        node2 -> node1 [color="0,1,0", len="2", w="0"];
	        node4 -> node3 [color="0,1,0", len="2", w="0"];
	        node3 -> node1 [color="0,1,0", len="2", w="0"];
	        node1 -> node4 [color="0,1,0", len="2", w="0"];
	}

Once I have created the nodes and connect them with edges, I transform the graph description into an image with the dot utility that comes with the GraphViz distribution. This program can produce output in several formats including Adobe PostScript, FrameMaker MIF, PNG, and many others. For this article I will use PNG so you can see the images. To generate the image file, I tell dot which output format I want with the -T switch and what the output file name is with the -o switch along with the name of the file which has the graph description. The -G switch allows me to specify options for the entire graph. In this case I want the color of the background to be white. You might not need this, but if you get an image full of black, that probably means GraphViz does not know which color you want to use for the background and uses black by default.

	
	prompt$ dot -Gbgcolor="white" -Tpng -o example.png example.dot

The image shows the graph that I created.

I can also change the color of the edges so that I can encode more information in the graph. The color of the edge can be used to indicate how often the program goes from one statement to another. I can then literally see the parts of the program that might deserve more consideration for optimization or debugging. In this example I have colored the lines involved in the loop blue to indicate that they execute more often than the other lines.

	
	digraph test {
	        node2 [color="0,1,0", label="my $test = 0;"];
	        node4 [color="0,1,0", label="my_print(\"Hello $test\n\");"];
	        node3 [color="0,1,0", label="print $_[0];"];
	        node1 [color="0,1,0", label="while( $test++ < 15 )"];
	        node2 -> node1 [color="0,1,1", len="2", w="0"];
	        node4 -> node3 [color="0,1,1", len="2", w="0"];
	        node3 -> node1 [color="0,1,1", len="2", w="0"];
	        node1 -> node4 [color="0,1,1", len="2", w="0"];
	}

Example GraphViz graph with colored edges

I already know that the Devel::SmallProf module can count the number of times the a line of code is executed and how much time it takes to execute that line. The Devel::GraphVizProf module does the same thing. Rather than output a text report like Devel::SmallProf does, Devel::GraphVizProf outputs a graph description. It uses the edge color to encode the line counts. Statements that are connected infrequently relative to other statements are colored darker and statements that are connected more frequently are colored more brightly. In this example, the edges that are black only happen a couple of times while the ones colored blue happen very frequently. I can easily identify where my program is spending time by looking at the colored lines rather than going through lines of test input. The power of pictures becomes apparent.

I modified the example script to add some lines of code that will be executed more often than those in the while loop to show how Devel::GraphVizProf displays relative frequencies of execution.

	
	#!/usr/bin/perl

	my $test = 0;

	while( $test++ < 100 )
	        {
	        my_print("Hello $test\n");
	        }

	my $sum = 0;
	foreach( 0 .. 1000 )
	        {
	        $sum += $_;
	        }

	sub my_print
	        {
	        print $_[0];
	        }

I run this script under the Devel::GraphVizProf debugger by using the -d switch.

	
	prompt$ perl -d:GraphVizProf example.pl

At the end of the program the debugger prints to standard output the information that I can pass to dot to create the graph. I can send the output to dot directly, but often the program I graph sends other information to standard output or I want to change the node information a bit. I save the information in a file until I am ready to make the graph.

	
	prompt$ perl -d:GraphVizProf example.pl > example.dot

I then edit out any extraneous output from the program and add any extra features I might want in the graph (such as background and foreground colors). Once I am satisfied I make a PNG image of the graph as I did before.

	
	prompt$ dot -Gbgcolor="white" -Tpng -o example.png example.dot

Look at how large that image is though (118k and 5052x2751). It is large not only in file size, but in dimension. The interesting code only takes a small portion of it since a lot of the code that I see in the image is from the parts of the debugger program which actually creates the image.

I don't want to see all of that. I can limit the graph to particular namespaces. If I want to limit my graph to the statements in particular namespaces, I can create a .smallprof file in the same directory from which I will run the program. The .smallprof file is included in the Devel::GraphVizProf module at runtime with do {}, so I can put valid Perl statements in there. If I create a hash named %DB::packages, Devel::GraphVizProf only profiles packages which exist as keys in that hash and have a true value (which is anything that is not 0, the empty string, or undef) will appear in the code graph.

By default, Perl programs are in the main namespace (or package) which corresponds to the main() loop in C. If I want to profile and graph only statements in the main namespace, I can use this .smallprof file.

	
	$DB::packages{'main'} = 1;

I then rerun the debugger and redraw the graph which turns out much smaller and easier to read.

	
	prompt$ perl -d:GraphVizProf example.pl > example.dot
	prompt$ dot -Gbgcolor="white" -Tpng -o example.png example.dot

The new image is much smaller and only shows the code of interest. Notice that the lines the execute more often are connected by lines that are brighter colors. If I had a much longer program, and a much larger code graph, I could easily scan the image looking for the brightest colored lines to see where the program is spending its time. Although this is not going to unlock the secrets of my program, but I can use the graph along with other information to decide how to optimize or debug it.

Just for kicks, I ran the test.pl script from the Business::ISBN module under the Devel::GraphVizProf debugger using a different .smallprof file so that I could also profile code in the Business::ISBN namespace.

	
	# the naked block defines the scope of the @modules array.  
	# i don't want to mess up the rest of the program ;)
	{
	my @modules = qw( main Business::ISBN );
	@DB::packages{ @modules } = @modules;
	}

The code graph generates a rather large image of the program (312k and 7866x6068).

There is a lot more that you can do with GraphViz to make these graphs prettier, but that is up to you. You can install prettier fonts, use different colors or outlines, and all sorts of other things to justify the use of a really expensive printer. Just do not tell your friends and coworkers how easy it is to do. :)

__END__

brian d foy has been a Perl user since 1994. He is founder of the first Perl users group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past three years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Some of brian's other articles have appeared in The Perl Journal.