[torqueusers] Mom Terminates Job before it is run

Garrick Staples garrick at usc.edu
Wed Jul 11 12:36:41 MDT 2007


On Wed, Jul 11, 2007 at 01:07:43PM -0500, Adam Emerich alleged:
> torqueusers-bounces at supercluster.org wrote on 07/10/2007 01:49:28 PM:
> 
> > On Tue, Jul 10, 2007 at 10:03:16AM -0500, Adam Emerich alleged:
> > > Jul 10 09:30:37 n01-001-0 pbs_mom: Success (0) in TMomFinalizeChild,
> cannot
> > > open qsub sock
> >
> > For interactive jobs, qsub creates and listens on a socket, and pbs_mom
> > must be able to connect to that socket.  If pbs_mom can't connect to
> > qsub, then job launch is a failure.
> 
> In our case the management node has a hostname of mn, so the torque request
> to the mom is coming down with mn as the hostname.  The mom could
> communicate with the management node but needs to do so using mnc as the
> hostname for the management node.  Does the torque socket that is opened
> have limitations on where the connection comes from to have a successful
> connection?  In our case the outgoing traffic from the management node goes
> out on ip 172.16.0.1, however when the compute nodes communicate back to
> the management nodes, they will use 172.15.0.1.

No, qsub doesn't care.

Is the problem the other way around?  pbs_mom can't get to 'mn'?

You might try setting 'QSUBHOST mnc' in $PBS_SERVER_HOME/torque.cfg on
mn.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070711/ca091f1c/attachment.bin


More information about the torqueusers mailing list