[torqueusers] PBS Error: Execution server rejected request

Garrick Staples garrick at usc.edu
Mon Nov 7 14:42:29 MST 2005

On Mon, Nov 07, 2005 at 12:08:59PM +1100, Chris Samuel alleged:
> On Sat, 5 Nov 2005 07:16 pm, garrick wrote:
> > It's pretty much painless.  Just install the new daemons and restart
> > them.  Don't restart MOMs on hosts that have running jobs.
> Nice - don't suppose you've got a record of which versions you've upgraded 
> between without hitting these issues ?

Unofficially, I suppose so.  This directory as every torque rpm that
I've ever had in production:


As you can see, some are pretty short-lived tiny jumps.  But there's
also some significant jumps like 1.0.1p6->1.1.0p4 and 1.1.0p4->1.2.0p1.
Before 1.0.1p6, I was using OpenPBS.

> Could be *really* handy for me as the earliest time our cluster will come 
> completely free if the running jobs hit their walltimes will be the 20th 
> January 2006.. :-(

I probably shouldn't, but I carefully mix and match different torque
versions all the time.  I've never had a problem.

> Also, I wonder if instead of
> > ?? restart MOMs on all idle nodes
> > ?? wait a minute, make sure node and job states are updating correctly
> > ?? mark busy nodes offline
> It might be safer to do something like:
>  - mark all nodes offline
>  - restart MOM's on idle nodes
>  - clear offline attribute on idle nodes
> Any thoughts ?

Doesn't matter since the first step is killing the scheduler.  

Also, historically, OpenPBS/TORQUE doesn't get job updates from offline
MS nodes.  That was fixed only relatively recently.

I think as long as you don't restart MOMs on active nodes, and kill the
scheduler while futzing around, it should be pretty safe no matter what
you do.

Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051107/d3c60b4c/attachment.bin

More information about the torqueusers mailing list