[torqueusers] PBS Error: Execution server rejected request
notinh notien
notinhnotien7 at hotmail.com
Fri Nov 4 13:16:39 MST 2005
Thank Mr. Staples. Here is the config for the mom. Originally, there was no
restricted directives. The weird thing is the other three cloned nodes with
the exact config file, and they are working right now.
$logevent 0x1ff
$restricted 10.0.1.250
$restricted master.stellar.com
$clienthost master
$usecp *.stellar.com:/home /home
$usecp *.stellar.com:/usr/local /usr/local
[root at master local]#/etc/rc.d/rc3.d/S85pbs_server restart
Shutting down PBS Server: [ OK ]
Starting PBS Server: [ OK ]
[root at master local]#momctl -d 3 -h node14
Host: node14.stellar.com/node14.stellar.com Server: master Version:
torque_1.1.0p4
PID: 4961
HomeDirectory: /var/spool/pbs/mom_priv
MOM active: 64304 seconds
Last Msg From Server: 64205 seconds (CLUSTER_ADDRS)
Last Msg To Server: 16 seconds
LOGLEVEL: 7 (use SIGUSR1/SIGUSR2 to adjust)
JobList: NONE
diagnostics complete
I actually have newer version in place but the cluster are quite busy and I
don't have much experience migrating current running jobs to new server. I
found some docs at the site regarding running 2 servers at the same time,
but I have not located docs to show how to migrate running jobs to new
server and how to replace old with new server with little impact on the
jobs. Please help me on these things.
Thank you very much for your help.
Steven.
>From: Garrick Staples <garrick at usc.edu>
>To: torqueusers at supercluster.org
>Subject: Re: [torqueusers] PBS Error: Execution server rejected request
>Date: Fri, 4 Nov 2005 11:32:11 -0800
>
>On Fri, Nov 04, 2005 at 06:07:24AM +0700, notinh notien alleged:
> > Oh, sorry. I did not perform the correct command for one of the test.
> > Here is the correct output.
> > [root at master ] momctl -d 3 -h node14
> >
> > Host: node14.stellar.com/node14.stellar.com Server: master Version:
> > torque_1.1.0p4
> > HomeDirectory: /var/spool/pbs/mom_priv
> > MOM active: 3717 seconds
> > Last Msg From Server: 3717 seconds (CLUSTER_ADDRS)
> > Last Msg To Server: 2 seconds
> > LOGLEVEL: 0 (use SIGUSR1/SIGUSR2 to adjust)
> > JobList: NONE
> >
> > diagnostics complete
>
>Can we see the MOM config file?
>
>If you restart pbs_server, does the "Last Msg From Server" number above
>get reset to a new lower number?
>
>And please consider upgrading! MOM<->server communications are vastly
>improved in newer versions.
>
>--
>Garrick Staples, Linux/HPCC Administrator
>University of Southern California
><< attach4 >>
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
More information about the torqueusers
mailing list