[torqueusers] torque on itanium
garrick at usc.edu
Fri Feb 22 18:44:38 MST 2008
On Tue, Feb 12, 2008 at 10:06:34PM +0100, Jan Snigula alleged:
> Hi torque developers,
> I'm trying to run torque on a 8 node itanium cluster (linux 2.6.9-47
> 4.6 HP blade servers). Any time a job is started on a node, the
> pbs_mom goes
> up to 100% CPU time, the job is executed but ends up in the exiting
> Here only a qdel -p (which left the pbs_mom in 100% CPU status) or a
> /etc/init.d/pbs_mom purge (which results in a normal behavior)
> releases the
> job and the CPU usage.
> To test it I setup a 1 node execution only environment and did a
> strace -etrace=desc -F -f -ff -p pid_of_pbs_mom before I submitted a
> job. (I saved the result of this process and can send it to you if
> interested). The overall behavior is (shown below), that when the job
> into execution: a huge number of "select" system calls is executed
> pbs_mom which drives the process to 100% CPU usage.
> I tested with torque-2.0pl11 up to torque-2.3.0-snap.200801151629.
> Can anyone help me?
Any errors in /var/log/messages or the mom logs?
I don't know of anyone doing development has itaniums anymore.
Feel free to send in relavant debugging info. Along with the strace
output, a gdb backtrace, and logging at a high loglevel would be good too.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080222/cb272d96/attachment-0001.bin
More information about the torqueusers