[torqueusers] PBS Error: Execution server rejected request

Chris Samuel csamuel at vpac.org
Sun Nov 6 18:08:59 MST 2005


On Sat, 5 Nov 2005 07:16 pm, garrick wrote:

> It's pretty much painless.  Just install the new daemons and restart
> them.  Don't restart MOMs on hosts that have running jobs.

Nice - don't suppose you've got a record of which versions you've upgraded 
between without hitting these issues ?

Could be *really* handy for me as the earliest time our cluster will come 
completely free if the running jobs hit their walltimes will be the 20th 
January 2006.. :-(

Also, I wonder if instead of

>   restart MOMs on all idle nodes
>   wait a minute, make sure node and job states are updating correctly
>   mark busy nodes offline

It might be safer to do something like:

 - mark all nodes offline
 - restart MOM's on idle nodes
 - clear offline attribute on idle nodes

Any thoughts ?

cheers!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051107/417e9350/attachment.bin


More information about the torqueusers mailing list