[torqueusers] Some nodes not returning .o and .e files
mathera at gmail.com
Fri Dec 3 14:58:55 MST 2010
We're just in the final stages of commissioning our new cluster and have
come across a strange behaviour.
Because we often have a very large number of short-running (iup to 10 hours)
jobs, our login nodes are set up to also recieve jobs from the queueing
system that can run and complete overnight (between 6pm and 8am).
On all our compute nodes, when jobs finish, the .o and .e files created by
Torque are returned to the user as expected, but on the login nodes, they're
not. They are created as usual, but just end up in the undelivered
directory under /usr/spool/PBS
These nodes are using the same config as the others, so the filesystems
identified in usecp are the same. These filesystems are mounted across the
cluster via NFS from the same sources
Apart from the time restriction, the only other difference we can see
between these and the compute nodes is that these are aslo the submit hosts
for jobs. As far as we can tell, the config is the same as the one on pour
existing cluster, where everything works as expected.
Can anyone shed some light as to where we might look next ?
"Computers are incredibly fast, accurate, and stupid;
Humans are incredibly slow, inaccurate and brilliant.
Together they are powerful beyond imagination."
A committee is a cul-de-sac, down which ideas are lured and then quietly
*Sir Barnett Cocks<http://www.quotationspage.com/quotes/Sir_Barnett_Cocks/>
"A mind is like a parachute. It doesnt work if it's not open." :- Frank
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers