[torquedev] Profiling for pbs_mom

Steve Snelgrove ssnelgrove at clusterresources.com
Wed May 7 15:41:05 MDT 2008


I just added the following section to the Torque Admin manual.  If 
anyone has much experience with profiling, I would appreciate their 
comments and suggestions.  Thanks.

http://www.clusterresources.com/torquedocs21/10.1troubleshooting.shtml

------------------------------------------------

Some hard problems in Torque deal with the amount of time spent in 
routines. For example, one currently open problem appears to be caused 
by the design of the code in linux/mom_mach.c where the statistics are 
gathered for the node status. It appears that the */proc* filesystem 
that contains information about the kernel and the processes is being 
accessed so often on some machines that the responces to some other 
message traffic is affected. The machine where this is happening has 128 
processors.

To debug these kinds of problems, it can be useful to see where in the 
code time is being spent. This is called profiling and there is a linux 
utility *gprof* that will output a listing of routines and the amount of 
time spent in these routines. This does require that the code be 
compiled with special options to instrument the code and to produce a 
file, gmon.out, that will be written at the end of program execution.

The following listing shows how to build Torque with profiling enabled. 
Notice that the output file for pbs_mom will end up in the mom_priv 
directory because its startup code changes the default directory to this 
location.

# ./configure "CFLAGS=-pg -lgcov -fPIC"
# make -j5
# make install
# pbs_mom
... do some stuff for a while ...
# momctl -s
# cd /var/spool/torque/mom_priv
# gprof -b `which pbs_mom` gmon.out |less
#




More information about the torquedev mailing list