[torqueusers] Re: LAM-MPI won't boot with torque-1.2.0p6

Troy Baer troy at osc.edu
Thu Sep 15 12:34:31 MDT 2005


On Thu, 2005-09-15 at 10:45 -0700, Garrick Staples wrote:
> On Thu, Sep 15, 2005 at 01:10:56PM -0400, Troy Baer alleged:
> > Do you have $clienthost entries in $PBS_HOME/mom_priv/config for all of
> > your compute nodes?  If not, I suspect that's your problem, as pbs_mom
> > needs a $clienthost entry for every host that's allowed to talk to it,
> > server and moms.
> 
> No you don't.  You need entries for all non-node hosts.  The first entry
> must be your pbs_server host, and add additional entries for other hosts
> if you want to be able to run 'momctl' or 'dumpmom'.

That's not what the pbs_mom manpage says, FWIW:

    clienthost
           which causes a host name to be added to the list of hosts
           which will be allowed to connect to MOM as long  as  they
           are  using  a privilaged port.  For example, here are two
           configuration file  lines  which  will  allow  the  hosts
           "fred" and "wilma" to connect:

           $clienthost      fred
           $clienthost      wilma

           Two  host  name  are  always  allowed  to  connection  to
           pbs_mom, "localhost" and the name returned to pbs_mom  by
           the  system  call gethostname().  These names need not be
           specified in the configuration file.  The hosts listed as
           "clienthosts"  comprise  a "sisterhood" of machines.  Any
           one of the sisterhood  will  accept  connections  from  a
           server from within the sisterhood.  They will also accept
           Resource Monitor (RM) requests and Internal MOM (IM) mes-
           sages from within the sisterhood.  For a sisterhood to be
           able to communicate IM messages to each other, they  must
           all share the same RM port.

> pbs_server propogates a list of all nodes to every node in your cluster.

If this is the case (and my experiments with TORQUE just now indicate
that it is, somewhat to my surprise), then the section of the pbs_mom
man page cited above is wrong and needs to be corrected.

I guess you learn something every day. :)

	--Troy
-- 
Troy Baer                       troy at osc.edu
Science & Technology Support    http://www.osc.edu/hpc/
Ohio Supercomputer Center       614-292-9701



More information about the torqueusers mailing list