[torqueusers] interactive qsub failure

Kenneth Yoshimoto kenneth at sdsc.edu
Fri Apr 27 15:28:14 MDT 2012

I'm seeing an intermittent failure with qsub -I

The message in /var/log/messages is:
Apr 27 14:07:27 gcn-17-71 pbs_mom: LOG_ERROR::Operation now in progress (115) in TMomFinalizeChild, cannot open interactive qsub socket to host gordon-ln4.local:50620 - 'cannot connect to port 1023 in client_to_svr - connection refused' - check routing tables/multi-homed host issues

I think my routing is okay, as I can telnet to the the login node
port from the compute node.  I also see some packet exchange to
the port with tcpdump.  Could the mom be attempting the connection
before qsub starts listening?  I would have thought qsub would
start listening before sending the job to pbs_server.  Any ideas
on what might cause this?


