[torqueusers] Intermittent pbs_server connection problems upon upgrading

Nate Coraor nate at psu.edu
Mon Jul 26 08:43:22 MDT 2010


Hi all,

I've recently upgraded from 2.1.11 to 2.4.8 and since doing so, have 
been experiencing a lot of delays in communication with pbs_server. 
qstat often takes a bit (~5-10 seconds) to respond, and sometimes 
doesn't at all (it looks like, if the response time is > 10 seconds), 
failing with this error:

pbs_iff: cannot connect to torque.example.org:15001 - timeout, errno=146 
(Connection refused) cannot connect to port 1022 in client_to_svr - 
connection refused
No Permission.
qstat: cannot connect to server torque.example.org (errno=15007) 
Unauthorized Request

Subsequent invocations of qstat succeed.  When this error is logged, 
nothing interesting is happening in pbs_server, even if running with 
loglevel 7, and the connection attempt is not logged at all.

I haven't completely ruled out connection problems, but at the very 
least, packets aren't dropping or taking long to move between the submit 
host and the server.

Is there an obvious place to start?

Thanks,
--nate


More information about the torqueusers mailing list