[torqueusers] Epilogoue.parallel scripts
Garrick Staples
garrick at usc.edu
Wed Sep 12 11:37:09 MDT 2007
On Wed, Sep 12, 2007 at 10:39:21AM -0700, Peter Wyckoff alleged:
>
> I notice in the docs that they don't get the exit code of the job run with
> qsub or pbsdsh in their environment. Is there a way to get this other than
> grepping the mom_logs?
You want to mark nodes offline based on the exit code of the job? So the next
time someone does 'echo blah blah blah | qsub', nodes get marked offline?
But to answer your question, no. I don't think sister nodes ever get the exit
value of the job.
You could always do this work from the normal epilogue.
> Also, this is run as root on a compute node so can't run pbsnodes -o
> <localhost> to take a bad machine out.
Why not?
> It can run momctl -s, but that isn't as nice as taking it offline. Is there
> another way to do this?
The health check script is really designed for this purpose.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070912/0f6e348c/attachment.bin
More information about the torqueusers
mailing list