[torqueusers] Server not talking to MOMs at all

Garrick Staples garrick at usc.edu
Mon Aug 15 15:49:21 MDT 2005

On Mon, Aug 15, 2005 at 05:11:19PM -0400, Prakash Velayutham alleged:
> Garrick Staples wrote:
> >On Mon, Aug 15, 2005 at 03:24:41PM -0400, Prakash Velayutham alleged:
> > 
> >
> >>Here is the output of momctl -d 4 -h yy.yy.yy.yy (on the compute node):
> >>   
> >>
> >
> >Does this work from the server?  Anything interesting in server's log 
> >files?
> >
> Hi Garrick,
> This is what I get from the server
> Host: xylose/xylose.dmzcluster.cchmc.org   Server: fructose   Version: 
> torque_1.2.0p5
> HomeDirectory:          /var/spool/torque/mom_priv
> MOM active:             6223 seconds
> WARNING:  no messages received from server
> Last Msg To Server:     20 seconds
> Server Update Interval: 20 seconds
> WARNING:  no hello/cluster-addrs messages received from server
> Init Msgs Sent:         624 hellos
> LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)
> Communication Model:    RPP
> TCP Timeout:            20 seconds
> Prolog Alarm Time:      300 seconds
> Alarm Time:             0 of 10 seconds
> Trusted Client List:,,,
> JobList:                NONE
> diagnostics complete
> Nothing strange at all in the server logs.

From that Trusted Client List, I'm making the following assumptions:
  xylose's IP is
  fructose has two interfaces: and
  xylose doesn't have access to your "live" 205.142 network and you intend for
    all cluster traffic to be on the 192.168 network.

Verify that the first $clienthost in your mom config resolves to
with matching forward and reverse.

Also verify that the names in $PBSHOME/server_priv/nodes resolves to with matching forward and reverse.

Either MOM is sending HELLOs to the wrong IP, the HELLOs are blocked in some
port filtering or firewalling, or server is sending INIT messages back to the
wrong place.  My suspicion is the first possibility.

Crank the loglevels on mom and server all the way up to 9.  MOM will log where
it is sending HELLOs, server will log who it got HELLOs from.

Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050815/e57ba2c1/attachment-0001.bin

More information about the torqueusers mailing list