[torqueusers] interactive qsub failure

Gabe Turner gabe at msi.umn.edu
Fri Apr 27 15:55:43 MDT 2012


On Fri, Apr 27, 2012 at 02:28:14PM -0700, Kenneth Yoshimoto wrote:
> 
> I'm seeing an intermittent failure with qsub -I
> 
> The message in /var/log/messages is:
> Apr 27 14:07:27 gcn-17-71 pbs_mom: LOG_ERROR::Operation now in progress (115) in TMomFinalizeChild, cannot open interactive qsub socket to host gordon-ln4.local:50620 - 'cannot connect to port 1023 in client_to_svr - connection refused' - check routing tables/multi-homed host issues
> 
> I think my routing is okay, as I can telnet to the the login node
> port from the compute node.  I also see some packet exchange to
> the port with tcpdump.  Could the mom be attempting the connection
> before qsub starts listening?  I would have thought qsub would
> start listening before sending the job to pbs_server.  Any ideas
> on what might cause this?

In order for an interactive session to work, the compute node needs to make
a connection back to the submission host, so you'll want to make sure that
your firewall rules allow that.

-- 
Gabe Turner                                             gabe at msi.umn.edu
HPC Systems Administrator,
University of Minnesota
Supercomputing Institute                          http://www.msi.umn.edu


More information about the torqueusers mailing list