[torqueusers] User's job can mess up the system so that no jobs run

Atwood, Robert C r.atwood at imperial.ac.uk
Thu Sep 6 10:31:56 MDT 2007


Hi, 
I suppose this does not happen that often, since it's the first time in
several years of using openPBS and then Torque that it has happened on
my system ...

One user submitted a malformed job of some kind that kept echoing a
string to stdout. Eventually it filled up the disk partition containing
/var/spool/torque . This happened on node01 (the first node in the list
of available nodes) 

Subsequently, all users' jobs failed to run or return any stdout or
stderr files, thus making it difficult to tell what the problem actually
was. That's because the jobs were always getting directed to node01 as
it was marked 'free'.

Is there a good way within torque to prevent this behaviour? Apart from
banning certain users that is! 

Thanks
Robert


More information about the torqueusers mailing list