[torqueusers] Torque 2.1.x pbs_server process hogging cpu
Martin Schafföner
martin.schaffoener at e-technik.uni-magdeburg.de
Mon Jun 12 08:22:09 MDT 2006
Today I felt like doing some updates, so I first tried upgrading from torque
2.0.0p7 (not too old, I guess) to torque 2.1.0p0. Installing the software
went fine; however, when I now submit a job, the job isn't executed. Instead,
the pbs_server process eats all of the available CPU time.
So I restarted pbs_server with "PBSLOGLEVEL=7
PBSDEBUG=1 /opt/torque/sbin/pbs_server"
and got the following output:
pbs_server is up
PBS_Server: Connection refused (111) in contact_sched, Could not contact
Scheduler - port 15004
and the logfile said (regarding the job):
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type AuthenticateUser request
received from schaffoe at cluster, sock=12
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type QueueJob request received from
schaffoe at cluster, sock=11
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type JobScript request received from
schaffoe at cluster, sock=11
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type ReadyToCommit request received
from schaffoe at cluster, sock=11
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type Commit request received from
schaffoe at cluster, sock=11
06/12/2006 16:13:27;0100;PBS_Server;Job;9339.cluster;enqueuing into feed,
state 1 hop 1
06/12/2006 16:13:27;0100;PBS_Server;Job;9339.cluster;dequeuing from feed,
state QUEUED
06/12/2006 16:13:27;0100;PBS_Server;Job;9339.cluster;enqueuing into xs, state
1 hop 1
06/12/2006 16:13:27;0008;PBS_Server;Job;9339.cluster;Job Queued at request of
schaffoe at cluster, owner = schaf
foe at cluster, job name = pbs_mpitest.pbs, queue = xs
06/12/2006 16:13:27;0040;PBS_Server;Svr;cluster;Scheduler sent command new
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type StatusNode request received from
moab at cluster, sock=10
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type StatusJob request received from
moab at cluster, sock=10
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type StatusQueue request received
from moab at cluster, sock=10
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type ModifyJob request received from
moab at cluster, sock=10
06/12/2006 16:13:27;0008;PBS_Server;Job;9339.cluster;Job Modified at request
of moab at cluster
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type RunJob request received from
moab at cluster, sock=10
06/12/2006 16:13:27;0008;PBS_Server;Job;9339.cluster;Job Run at request of
moab at cluster
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type ModifyJob request received from
moab at cluster, sock=10
06/12/2006 16:13:27;0008;PBS_Server;Job;9339.cluster;Job Modified at request
of moab at cluster
There is nothing weird as far as I'm concerned. I noticed that the job script
is still sitting in the server_priv/jobs directory, if that is of any
interest. Also, the latest snapshot of 2.1.1 did not help the solution
either.
Regards,
--
Martin Schafföner
Cognitive Systems Group, Institute of Electronics, Signal Processing and
Communication Technologies, Department of Electrical Engineering,
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063
More information about the torqueusers
mailing list