[torqueusers] Mom Terminates Job before it is run
garrick at usc.edu
Wed Jul 11 14:18:06 MDT 2007
On Wed, Jul 11, 2007 at 02:55:18PM -0500, Adam Emerich alleged:
> torqueusers-bounces at supercluster.org wrote on 07/11/2007 01:36:41 PM:
> > On Wed, Jul 11, 2007 at 01:07:43PM -0500, Adam Emerich alleged:
> > > torqueusers-bounces at supercluster.org wrote on 07/10/2007 01:49:28 PM:
> > >
> > > > On Tue, Jul 10, 2007 at 10:03:16AM -0500, Adam Emerich alleged:
> > > > > Jul 10 09:30:37 n01-001-0 pbs_mom: Success (0) in
> > > cannot
> > > > > open qsub sock
> > > >
> > > > For interactive jobs, qsub creates and listens on a socket, and
> > > > must be able to connect to that socket. If pbs_mom can't connect to
> > > > qsub, then job launch is a failure.
> > >
> > > In our case the management node has a hostname of mn, so the torque
> > > to the mom is coming down with mn as the hostname. The mom could
> > > communicate with the management node but needs to do so using mnc as
> > > hostname for the management node. Does the torque socket that is
> > > have limitations on where the connection comes from to have a
> > > connection? In our case the outgoing traffic from the management node
> > > out on ip 172.16.0.1, however when the compute nodes communicate back
> > > the management nodes, they will use 22.214.171.124.
> > No, qsub doesn't care.
> > Is the problem the other way around? pbs_mom can't get to 'mn'?
> Yes, I know that is the case. Compute nodes cannot ping hostname mn.
That it is likely the problem.
> > You might try setting 'QSUBHOST mnc' in $PBS_SERVER_HOME/torque.cfg on
> > mn.
> I will try this, but I don't see this option documented in the Torque Wiki.
> Is it a new option? Also, I currently do not have a torque.cfg file in the
> $PBS_SERVER_HOME. Is it ok to just create this file or does torque need to
> be configured and rebuilt to use it?
I'll add it to the wiki. I need to add it to the manpage too.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070711/7a143a81/attachment.bin
More information about the torqueusers