[torqueusers] Torque 2.1.x pbs_server process hogging cpu

Martin Schafföner martin.schaffoener at e-technik.uni-magdeburg.de
Mon Jun 12 08:22:09 MDT 2006


Today I felt like doing some updates, so I first tried upgrading from torque 
2.0.0p7 (not too old, I guess) to torque 2.1.0p0. Installing the software 
went fine; however, when I now submit a job, the job isn't executed. Instead, 
the pbs_server process eats all of the available CPU time.

So I restarted pbs_server with "PBSLOGLEVEL=7 
PBSDEBUG=1 /opt/torque/sbin/pbs_server"

and got the following output:

pbs_server is up
PBS_Server: Connection refused (111) in contact_sched, Could not contact 
Scheduler - port 15004

and the logfile said (regarding the job):

06/12/2006 16:13:27;0100;PBS_Server;Req;;Type AuthenticateUser request 
received from schaffoe at cluster, sock=12
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type QueueJob request received from 
schaffoe at cluster, sock=11
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type JobScript request received from 
schaffoe at cluster, sock=11
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type ReadyToCommit request received 
from schaffoe at cluster, sock=11
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type Commit request received from 
schaffoe at cluster, sock=11
06/12/2006 16:13:27;0100;PBS_Server;Job;9339.cluster;enqueuing into feed, 
state 1 hop 1
06/12/2006 16:13:27;0100;PBS_Server;Job;9339.cluster;dequeuing from feed, 
state QUEUED
06/12/2006 16:13:27;0100;PBS_Server;Job;9339.cluster;enqueuing into xs, state 
1 hop 1
06/12/2006 16:13:27;0008;PBS_Server;Job;9339.cluster;Job Queued at request of 
schaffoe at cluster, owner = schaf
foe at cluster, job name = pbs_mpitest.pbs, queue = xs
06/12/2006 16:13:27;0040;PBS_Server;Svr;cluster;Scheduler sent command new
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type StatusNode request received from 
moab at cluster, sock=10
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type StatusJob request received from 
moab at cluster, sock=10
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type StatusQueue request received 
from moab at cluster, sock=10
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type ModifyJob request received from 
moab at cluster, sock=10
06/12/2006 16:13:27;0008;PBS_Server;Job;9339.cluster;Job Modified at request 
of moab at cluster
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type RunJob request received from 
moab at cluster, sock=10
06/12/2006 16:13:27;0008;PBS_Server;Job;9339.cluster;Job Run at request of 
moab at cluster
06/12/2006 16:13:27;0100;PBS_Server;Req;;Type ModifyJob request received from 
moab at cluster, sock=10
06/12/2006 16:13:27;0008;PBS_Server;Job;9339.cluster;Job Modified at request 
of moab at cluster

There is nothing weird as far as I'm concerned. I noticed that the job script 
is still sitting in the server_priv/jobs directory, if that is of any 
interest. Also, the latest snapshot of 2.1.1 did not help the solution 
either.

Regards,
-- 
Martin Schafföner

Cognitive Systems Group, Institute of Electronics, Signal Processing and 
Communication Technologies, Department of Electrical Engineering, 
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063


More information about the torqueusers mailing list