[torqueusers] torque on itanium

Garrick Staples garrick at usc.edu
Fri Feb 22 18:44:38 MST 2008


On Tue, Feb 12, 2008 at 10:06:34PM +0100, Jan Snigula alleged:
> Hi torque developers,
> 
> I'm trying to run torque on a 8 node itanium cluster (linux 2.6.9-47  
> centos
> 4.6 HP blade servers). Any time a job is started on a node, the  
> pbs_mom goes
> up to 100% CPU time, the job is executed but ends up in the exiting  
> state.
> Here only a qdel -p (which left the pbs_mom in 100% CPU status) or a
> /etc/init.d/pbs_mom purge (which results in a normal behavior)  
> releases the
> job and the CPU usage.
> 
> To test it I setup a 1 node execution only environment and did a
> 	strace -etrace=desc -F -f -ff -p pid_of_pbs_mom before I submitted a
> job. (I saved the result of this process and can send it to you if
> interested). The overall behavior is (shown below), that when the job  
> goes
> into execution: a huge number of "select" system calls is executed  
> within
> pbs_mom which drives the process to 100% CPU usage.
> 
> I tested with torque-2.0pl11 up to torque-2.3.0-snap.200801151629.
> 
> Can anyone help me?

Any errors in /var/log/messages or the mom logs?

I don't know of anyone doing development has itaniums anymore.

Feel free to send in relavant debugging info.  Along with the strace
output, a gdb backtrace, and logging at a high loglevel would be good too.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080222/cb272d96/attachment-0001.bin


More information about the torqueusers mailing list