[torquedev] Should a communication error between pbs_mom's kill a job ?

Chris Samuel csamuel at vpac.org
Wed Jul 15 19:11:26 MDT 2009

----- "Glen Beane" <glen.beane at gmail.com> wrote:

> However, based on this conversation, I think the best thing to do
> would be to get rid of this new attribute and change the mom code so
> that the mother superior never sets pjob->ji_nodekill when it gets an
> error from a POLL request...

Agreed, we apply the attached hack^Wpatch to (hopefully)
stop this from happening..

Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-dont-kill-jobs-on-comms-error
Type: application/octet-stream
Size: 362 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20090716/1435ea72/attachment.obj 

More information about the torquedev mailing list