[torqueusers] Job eligible, nodes free, but job would not start

Neelesh Arora narora at Princeton.EDU
Thu Oct 12 16:58:09 MDT 2006

Hi All,

I am using torque-2.0.0p2 and maui-3.2.6p13, and notice the following 
behavior today:

- There are several jobs in the queue that are in the Q state. When I do 
checkjob <jobid>, I get (among other things):
"job can run in partition DEFAULT (63 procs available.  1 procs required)"
but the job remains in Q forever. It is not the case of a resource 
requirement not being met (as the above message indicates)

- nothing untoward in the torque logs

- I see several of these messages in maui.log:
MSysRegEvent(JOBCORRUPTION:  job 'jobid' has the following idle node(s) 
allocated: 'node114' ,0,0,1)
but these are for the running jobs, not the Q'ed jobs in question

- I also see messages like these in the maui.log:
INFO:     PBS node node114 set to state Idle (free)
INFO:     node 'node114' changed states from Running to Idle
although, this node has 2 out of 4 procs busy
this message is repeated for several nodes.

- restarting torque and maui did not help either

- if I say qrun <jobid> for the stuck jobs, I get:
qrun: Resource temporarily unavailable <jobid>

- but if I do runjob <jobid>, the jobs are started !!

I am unable to correlate all this information. Does anyone know what can 
be going wrong, or where else can I hunt for things?



More information about the torqueusers mailing list