[torqueusers] Strange job behaviour

Greg Wimpey gwimpey at mines.edu
Wed Sep 22 09:59:50 MDT 2004


Have you logged in to the node while the job is running to watch what's
going on (e.g., run top and see if some unexpected process is running
alongside)?  When you say "equal" jobs, do you mean identical?  Same
code, same input data and parameters?  Do the jobs read/write files from
an NFS server?  If so, have you tried running the job using data on
local disk?  I'm assuming the nodes are configured identically (same
CPU, same amount/type of RAM, same O/S version).

This is where I would start looking.

On Sat, 2004-09-18 at 08:17, Paulo Silva wrote:
> Hello,
> 
> I'm using torque in a 16 node Beowulf cluster and I've been noticing
> that sometimes, if I submit 16 jobs simultaneous, there's one job that
> seems to be running more slowly than the others.
> 
> I tried submitting 16 equal jobs and the problem remains most of the
> time. Also I've noticed that problem doesn't happens in the same node
> (so I'm excluding some hardware problem).
> 
> This may not even be a torque/pbs issue but if someone already had the
> same problem maybe I can get a clue to what it's happening.
> 
> Thanks for any advice.



More information about the torqueusers mailing list