[torqueusers] Re: LAM-MPI won't boot with torque-1.2.0p6
troy at osc.edu
Thu Sep 15 12:34:31 MDT 2005
On Thu, 2005-09-15 at 10:45 -0700, Garrick Staples wrote:
> On Thu, Sep 15, 2005 at 01:10:56PM -0400, Troy Baer alleged:
> > Do you have $clienthost entries in $PBS_HOME/mom_priv/config for all of
> > your compute nodes? If not, I suspect that's your problem, as pbs_mom
> > needs a $clienthost entry for every host that's allowed to talk to it,
> > server and moms.
> No you don't. You need entries for all non-node hosts. The first entry
> must be your pbs_server host, and add additional entries for other hosts
> if you want to be able to run 'momctl' or 'dumpmom'.
That's not what the pbs_mom manpage says, FWIW:
which causes a host name to be added to the list of hosts
which will be allowed to connect to MOM as long as they
are using a privilaged port. For example, here are two
configuration file lines which will allow the hosts
"fred" and "wilma" to connect:
Two host name are always allowed to connection to
pbs_mom, "localhost" and the name returned to pbs_mom by
the system call gethostname(). These names need not be
specified in the configuration file. The hosts listed as
"clienthosts" comprise a "sisterhood" of machines. Any
one of the sisterhood will accept connections from a
server from within the sisterhood. They will also accept
Resource Monitor (RM) requests and Internal MOM (IM) mes-
sages from within the sisterhood. For a sisterhood to be
able to communicate IM messages to each other, they must
all share the same RM port.
> pbs_server propogates a list of all nodes to every node in your cluster.
If this is the case (and my experiments with TORQUE just now indicate
that it is, somewhat to my surprise), then the section of the pbs_mom
man page cited above is wrong and needs to be corrected.
I guess you learn something every day. :)
Troy Baer troy at osc.edu
Science & Technology Support http://www.osc.edu/hpc/
Ohio Supercomputer Center 614-292-9701
More information about the torqueusers