[torqueusers] Some nodes not returning .o and .e files

Andrew Mather mathera at gmail.com
Fri Dec 3 14:58:55 MST 2010

Hi All,

We're just in the final stages of commissioning our new cluster and have
come across a strange behaviour.

Because we often have a very large number of short-running (iup to 10 hours)
jobs, our login nodes are set up to also recieve jobs from the queueing
system that can run and complete overnight (between 6pm and 8am).

On all our compute nodes, when jobs finish, the .o and .e files created by
Torque are returned to the user as expected, but on the login nodes, they're
not.  They are created as usual, but just end up in the undelivered
directory under /usr/spool/PBS

These nodes are using the same config as the others, so the filesystems
identified in usecp are the same.  These filesystems are mounted across the
cluster via NFS from the same sources

Apart from the time restriction, the only other difference we can see
between these and the compute nodes is that these are aslo the submit hosts
for jobs.   As far as we can tell, the config is the same as the one on pour
existing cluster, where everything works as expected.

Can anyone shed some light as to where we might look next ?


"Computers are incredibly fast, accurate, and stupid;
Humans are incredibly slow, inaccurate and brilliant.
Together they are powerful beyond imagination."
   Albert Einstein
A committee is a cul-de-sac, down which ideas are lured and then quietly
  *Sir Barnett Cocks<http://www.quotationspage.com/quotes/Sir_Barnett_Cocks/>
"A mind is like a parachute. It doesnt work if it's not open." :- Frank
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20101204/e6153a53/attachment.html 

More information about the torqueusers mailing list