[torqueusers] Intermittent pbs_server connection problems upon upgrading
nate at psu.edu
Mon Jul 26 08:43:22 MDT 2010
I've recently upgraded from 2.1.11 to 2.4.8 and since doing so, have
been experiencing a lot of delays in communication with pbs_server.
qstat often takes a bit (~5-10 seconds) to respond, and sometimes
doesn't at all (it looks like, if the response time is > 10 seconds),
failing with this error:
pbs_iff: cannot connect to torque.example.org:15001 - timeout, errno=146
(Connection refused) cannot connect to port 1022 in client_to_svr -
qstat: cannot connect to server torque.example.org (errno=15007)
Subsequent invocations of qstat succeed. When this error is logged,
nothing interesting is happening in pbs_server, even if running with
loglevel 7, and the connection attempt is not logged at all.
I haven't completely ruled out connection problems, but at the very
least, packets aren't dropping or taking long to move between the submit
host and the server.
Is there an obvious place to start?
More information about the torqueusers