[torqueusers] interactive qsub failure
kenneth at sdsc.edu
Mon Apr 30 16:10:04 MDT 2012
I do see the syn, syn/ack, rst pattern with the failed
attempts. I'll give your suggestion a try.
On Fri, 27 Apr 2012, Michael Jennings wrote:
> Date: Fri, 27 Apr 2012 16:01:50 -0700
> From: Michael Jennings <mej at lbl.gov>
> Reply-To: Torque Users Mailing List <torqueusers at supercluster.org>
> To: torqueusers at supercluster.org
> Subject: Re: [torqueusers] interactive qsub failure
> On Friday, 27 April 2012, at 14:28:14 (-0700),
> Kenneth Yoshimoto wrote:
>> I'm seeing an intermittent failure with qsub -I
>> The message in /var/log/messages is:
>> Apr 27 14:07:27 gcn-17-71 pbs_mom: LOG_ERROR::Operation now in progress (115) in TMomFinalizeChild, cannot open interactive qsub socket to host gordon-ln4.local:50620 - 'cannot connect to port 1023 in client_to_svr - connection refused' - check routing tables/multi-homed host issues
>> I think my routing is okay, as I can telnet to the the login node
>> port from the compute node. I also see some packet exchange to
>> the port with tcpdump. Could the mom be attempting the connection
>> before qsub starts listening? I would have thought qsub would
>> start listening before sending the job to pbs_server. Any ideas
>> on what might cause this?
> Are you by any chance seeing a SYN, a SYN/ACK, and a RST?
> If so, try setting $max_conn_timeout_micro_sec to 500000 in your
> pbs_mom config and see if that helps.
More information about the torqueusers