[torqueusers] PBS Error: Execution server rejected request
Chris Samuel
csamuel at vpac.org
Sun Nov 6 18:08:59 MST 2005
On Sat, 5 Nov 2005 07:16 pm, garrick wrote:
> It's pretty much painless. Just install the new daemons and restart
> them. Don't restart MOMs on hosts that have running jobs.
Nice - don't suppose you've got a record of which versions you've upgraded
between without hitting these issues ?
Could be *really* handy for me as the earliest time our cluster will come
completely free if the running jobs hit their walltimes will be the 20th
January 2006.. :-(
Also, I wonder if instead of
> restart MOMs on all idle nodes
> wait a minute, make sure node and job states are updating correctly
> mark busy nodes offline
It might be safer to do something like:
- mark all nodes offline
- restart MOM's on idle nodes
- clear offline attribute on idle nodes
Any thoughts ?
cheers!
Chris
--
Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051107/417e9350/attachment.bin
More information about the torqueusers
mailing list