[torqueusers] Mom Terminates Job before it is run

Adam Emerich aemerich at us.ibm.com
Wed Jul 11 13:55:18 MDT 2007


torqueusers-bounces at supercluster.org wrote on 07/11/2007 01:36:41 PM:

> On Wed, Jul 11, 2007 at 01:07:43PM -0500, Adam Emerich alleged:
> > torqueusers-bounces at supercluster.org wrote on 07/10/2007 01:49:28 PM:
> >
> > > On Tue, Jul 10, 2007 at 10:03:16AM -0500, Adam Emerich alleged:
> > > > Jul 10 09:30:37 n01-001-0 pbs_mom: Success (0) in
TMomFinalizeChild,
> > cannot
> > > > open qsub sock
> > >
> > > For interactive jobs, qsub creates and listens on a socket, and
pbs_mom
> > > must be able to connect to that socket.  If pbs_mom can't connect to
> > > qsub, then job launch is a failure.
> >
> > In our case the management node has a hostname of mn, so the torque
request
> > to the mom is coming down with mn as the hostname.  The mom could
> > communicate with the management node but needs to do so using mnc as
the
> > hostname for the management node.  Does the torque socket that is
opened
> > have limitations on where the connection comes from to have a
successful
> > connection?  In our case the outgoing traffic from the management node
goes
> > out on ip 172.16.0.1, however when the compute nodes communicate back
to
> > the management nodes, they will use 172.15.0.1.
>
> No, qsub doesn't care.
>
> Is the problem the other way around?  pbs_mom can't get to 'mn'?

Yes, I know that is the case.  Compute nodes cannot ping hostname mn.
>
> You might try setting 'QSUBHOST mnc' in $PBS_SERVER_HOME/torque.cfg on
> mn.

I will try this, but I don't see this option documented in the Torque Wiki.
Is it a new option?  Also, I currently do not have a torque.cfg file in the
$PBS_SERVER_HOME.  Is it ok to just create this file or does torque need to
be configured and rebuilt to use it?
>
> --
> Garrick Staples, GNU/Linux HPCC SysAdmin
> University of Southern California
>
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> [attachment "attvz47y.dat" deleted by Adam Emerich/Rochester/IBM]
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list