[torqueusers] interactive qsub failure

Kenneth Yoshimoto kenneth at sdsc.edu
Mon Apr 30 16:10:04 MDT 2012


I do see the syn, syn/ack, rst pattern with the failed
attempts.  I'll give your suggestion a try.

Thanks!
Kenneth

On Fri, 27 Apr 2012, Michael Jennings wrote:

> Date: Fri, 27 Apr 2012 16:01:50 -0700
> From: Michael Jennings <mej at lbl.gov>
> Reply-To: Torque Users Mailing List <torqueusers at supercluster.org>
> To: torqueusers at supercluster.org
> Subject: Re: [torqueusers] interactive qsub failure
> 
> On Friday, 27 April 2012, at 14:28:14 (-0700),
> Kenneth Yoshimoto wrote:
>
>>
>> I'm seeing an intermittent failure with qsub -I
>>
>> The message in /var/log/messages is:
>> Apr 27 14:07:27 gcn-17-71 pbs_mom: LOG_ERROR::Operation now in progress (115) in TMomFinalizeChild, cannot open interactive qsub socket to host gordon-ln4.local:50620 - 'cannot connect to port 1023 in client_to_svr - connection refused' - check routing tables/multi-homed host issues
>>
>> I think my routing is okay, as I can telnet to the the login node
>> port from the compute node.  I also see some packet exchange to
>> the port with tcpdump.  Could the mom be attempting the connection
>> before qsub starts listening?  I would have thought qsub would
>> start listening before sending the job to pbs_server.  Any ideas
>> on what might cause this?
>
> Are you by any chance seeing a SYN, a SYN/ACK, and a RST?
>
> If so, try setting $max_conn_timeout_micro_sec to 500000 in your
> pbs_mom config and see if that helps.
>
> Michael
>
>


More information about the torqueusers mailing list