[torqueusers] Intermittent pbs_server connection problems upon upgrading

Garrick Staples garrick at usc.edu
Mon Jul 26 10:02:05 MDT 2010


Obvious place to start is to strace pbs_server and see if it is hanging on anything.

But I don't think the problem is with pbs_server because pbs_iff is returning with Connection refused. I'm pretty sure that error is occuring before anything gets to pbs_server.


On Jul 26, 2010, at 7:43 AM, Nate Coraor wrote:

> Hi all,
> 
> I've recently upgraded from 2.1.11 to 2.4.8 and since doing so, have 
> been experiencing a lot of delays in communication with pbs_server. 
> qstat often takes a bit (~5-10 seconds) to respond, and sometimes 
> doesn't at all (it looks like, if the response time is > 10 seconds), 
> failing with this error:
> 
> pbs_iff: cannot connect to torque.example.org:15001 - timeout, errno=146 
> (Connection refused) cannot connect to port 1022 in client_to_svr - 
> connection refused
> No Permission.
> qstat: cannot connect to server torque.example.org (errno=15007) 
> Unauthorized Request
> 
> Subsequent invocations of qstat succeed.  When this error is logged, 
> nothing interesting is happening in pbs_server, even if running with 
> loglevel 7, and the connection attempt is not logged at all.
> 
> I haven't completely ruled out connection problems, but at the very 
> least, packets aren't dropping or taking long to move between the submit 
> host and the server.
> 
> Is there an obvious place to start?
> 
> Thanks,
> --nate
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list