[torqueusers] job execution error

Sam Rash srash at yahoo-inc.com
Thu Oct 5 11:42:28 MDT 2006

So we've got a 'drone script' that we've been running through this torque
server 10k times a day w/o problems.  Suddenly one node gets this in the
stderr (.ER file) for a job:


-bash: line 1: /home/y/var/pbs/mom_priv/jobs/1899889.med.SC: No such file or


Isn't that the generated script PBS makes for you when you do echo my
command | qsub ?

Does this simply mean

1)       it wasn't created somehow? (newly created bug in our setup, newly
exposed bug in pbs?)

2)       it got deleted somehow

3)       we have cluster gnomes whom come out at night and do strange things
to our boxes.


anyone else seen this?


Also, does torque have a feature that if say K jobs have failed on node Y
maybe in some time span T, automatically mark it offline and email the

(it seems we could write a quick perl hack to do this, by why reinvent..?)





Sam Rash

srash at yahoo-inc.com




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20061005/c3a51cb9/attachment.html

More information about the torqueusers mailing list