[torqueusers] pbs_server init script?

Coyle, James J [ITACD] jjc at iastate.edu
Mon Apr 5 09:57:48 MDT 2010


Peter,

   Suggestion:

     Move the pbs init scripts to later in the process after
the other services are up, e.g. to S99pbs_server. 

Reasoning
---------
   The only difference between your example cases: 
1) starting pbs from init scripts starting with S20 and 
2) starting pbs once the system comes up 
is that all the services are running in case 2, 
so the init script must be launching too early in case 1).

   Since this is a remote node, I'd guess the problem is that networking 
is not started yet.

   The scripts run in alphabetical order, so make sure that the network init 
script is run first.  I've seen networking being S30network or S40network, 
but I see on current my RedHat system it is now S10network.  I think that 
S20pbs_server would work OK with S10network, but not with S30network. 

   I run my pbs scripts late (S85) because I want other services up first.
There is no real need for pbs_server to be up immediately, as no other service
depends on it, but pbs depends on other services.

   The reason why it might occasionally raise the error is that the pbs_server
program starts running in the background (since it is a daemon) so the init processing 
can go on, and the networking script may be invoked soon afterward. Then it is just a 
race to see whether the network comes up before pbs_server gets to the setup_nodes 
procedure, with pbs_server having a head start. Networking is not a daemon, so when the
network init script completes, the network is up.

   I wouldn't move networking earlier, since you are probably also running a firewall 
and you want to make sure that the firewall comes up before exposing the system to
the network.

Best of luck with your clustering,
 - Jim C.


 James Coyle, PhD
 High Performance Computing Group     
 115 Durham Center            
 Iowa State Univ.           phone: (515)-294-2099
 Ames, Iowa 50011           web: http://www.public.iastate.edu/~jjc




-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Peter Smith
Sent: Monday, April 05, 2010 7:22 AM
To: skip at pobox.com
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] pbs_server init script?

When installing the scripts as explained in the documentation the
script is symlinked as S20pbs_server/S20pbs_server in rc1.d to rc6.d.
The hostname is set in S02hostname.sh in rcS.d so i would say that the
hostname is configured when running pbs_server. I have also tried to
symlink the pbs_server script as S99pbs_server but that has not
removed the error either.

worker01 is the first client in the nodes file.

On Mon, Apr 5, 2010 at 2:01 PM,  <skip at pobox.com> wrote:
>
>    Peter> PBS_Server;Svr;PBS_Server;LOG_ERROR::process_host_name_part, host
>    Peter> worker01 not found
>    ...
>    Peter> Any suggestions on  what could be wrong?
>
> Are your PBS init scripts being run before the hostname has been set?
>
> --
> Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/
>
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list