[torqueusers] PBS Error: Execution server rejected request

notinh notien notinhnotien7 at hotmail.com
Fri Nov 4 13:16:39 MST 2005


Thank Mr. Staples.  Here is the config for the mom. Originally, there was no 
restricted directives.  The weird thing is the other three cloned nodes with 
the exact config file, and they are working right now.

$logevent 0x1ff
$restricted     10.0.1.250
$restricted     master.stellar.com

$clienthost     master
$usecp  *.stellar.com:/home /home
$usecp  *.stellar.com:/usr/local /usr/local

[root at master local]#/etc/rc.d/rc3.d/S85pbs_server restart
Shutting down PBS Server:                                  [  OK  ]
Starting PBS Server:                                       [  OK  ]
[root at master local]#momctl -d 3 -h node14

Host: node14.stellar.com/node14.stellar.com   Server: master   Version: 
torque_1.1.0p4
PID:                    4961
HomeDirectory:          /var/spool/pbs/mom_priv
MOM active:             64304 seconds
Last Msg From Server:   64205 seconds (CLUSTER_ADDRS)
Last Msg To Server:     16 seconds
LOGLEVEL:               7 (use SIGUSR1/SIGUSR2 to adjust)
JobList:                NONE

diagnostics complete

I actually have newer version in place but the cluster are quite busy and I 
don't have much experience migrating current running jobs to new server.  I 
found some docs at the site regarding running 2 servers at the same time, 
but I have not located docs to show how to migrate running jobs to new 
server and how to replace old with new server with little impact on the 
jobs.  Please help me on these things.

Thank you very much for your help.
Steven.

>From: Garrick Staples <garrick at usc.edu>
>To: torqueusers at supercluster.org
>Subject: Re: [torqueusers] PBS Error: Execution server rejected request
>Date: Fri, 4 Nov 2005 11:32:11 -0800
>
>On Fri, Nov 04, 2005 at 06:07:24AM +0700, notinh notien alleged:
> > Oh, sorry.  I did not perform the correct command for one of the test.
> > Here is the correct output.
> > [root at master ] momctl -d 3 -h node14
> >
> > Host: node14.stellar.com/node14.stellar.com   Server: master   Version:
> > torque_1.1.0p4
> > HomeDirectory:          /var/spool/pbs/mom_priv
> > MOM active:             3717 seconds
> > Last Msg From Server:   3717 seconds (CLUSTER_ADDRS)
> > Last Msg To Server:     2 seconds
> > LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
> > JobList:                NONE
> >
> > diagnostics complete
>
>Can we see the MOM config file?
>
>If you restart pbs_server, does the "Last Msg From Server" number above
>get reset to a new lower number?
>
>And please consider upgrading!  MOM<->server communications are vastly
>improved in newer versions.
>
>--
>Garrick Staples, Linux/HPCC Administrator
>University of Southern California


><< attach4 >>




>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/



More information about the torqueusers mailing list