[torqueusers] pbs_server init script?
Coyle, James J [ITACD]
jjc at iastate.edu
Mon Apr 5 09:57:48 MDT 2010
Peter,
Suggestion:
Move the pbs init scripts to later in the process after
the other services are up, e.g. to S99pbs_server.
Reasoning
---------
The only difference between your example cases:
1) starting pbs from init scripts starting with S20 and
2) starting pbs once the system comes up
is that all the services are running in case 2,
so the init script must be launching too early in case 1).
Since this is a remote node, I'd guess the problem is that networking
is not started yet.
The scripts run in alphabetical order, so make sure that the network init
script is run first. I've seen networking being S30network or S40network,
but I see on current my RedHat system it is now S10network. I think that
S20pbs_server would work OK with S10network, but not with S30network.
I run my pbs scripts late (S85) because I want other services up first.
There is no real need for pbs_server to be up immediately, as no other service
depends on it, but pbs depends on other services.
The reason why it might occasionally raise the error is that the pbs_server
program starts running in the background (since it is a daemon) so the init processing
can go on, and the networking script may be invoked soon afterward. Then it is just a
race to see whether the network comes up before pbs_server gets to the setup_nodes
procedure, with pbs_server having a head start. Networking is not a daemon, so when the
network init script completes, the network is up.
I wouldn't move networking earlier, since you are probably also running a firewall
and you want to make sure that the firewall comes up before exposing the system to
the network.
Best of luck with your clustering,
- Jim C.
James Coyle, PhD
High Performance Computing Group
115 Durham Center
Iowa State Univ. phone: (515)-294-2099
Ames, Iowa 50011 web: http://www.public.iastate.edu/~jjc
-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Peter Smith
Sent: Monday, April 05, 2010 7:22 AM
To: skip at pobox.com
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] pbs_server init script?
When installing the scripts as explained in the documentation the
script is symlinked as S20pbs_server/S20pbs_server in rc1.d to rc6.d.
The hostname is set in S02hostname.sh in rcS.d so i would say that the
hostname is configured when running pbs_server. I have also tried to
symlink the pbs_server script as S99pbs_server but that has not
removed the error either.
worker01 is the first client in the nodes file.
On Mon, Apr 5, 2010 at 2:01 PM, <skip at pobox.com> wrote:
>
> Peter> PBS_Server;Svr;PBS_Server;LOG_ERROR::process_host_name_part, host
> Peter> worker01 not found
> ...
> Peter> Any suggestions on what could be wrong?
>
> Are your PBS init scripts being run before the hostname has been set?
>
> --
> Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/
>
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list