[Mauiusers] Re: [torqueusers] Job eligible, nodes free, but job would not start

Neelesh Arora narora at Princeton.EDU
Fri Oct 13 10:43:00 MDT 2006


An update:
I notice that when these jobs are stuck, one way to get them started is 
to set a walltime (using qalter) less than the default walltime. We set 
a default_walltime of 9999:00:00 at the server level and require the 
users to specify the needed cpu-time.

This was set a long time ago and has not been causing any issues. But it 
seems now that if you have set this default and then a user submits a 
job with an explicit -l walltime=<time> specification, then that job 
runs while older jobs with default walltime wait.

Can some one please shed some light on this - I am out of clues here?

Thanks.

-Neel

Neelesh Arora wrote:
> Hi All,
> 
> I am using torque-2.0.0p2 and maui-3.2.6p13, and notice the following 
> behavior today:
> 
> - There are several jobs in the queue that are in the Q state. When I do 
> checkjob <jobid>, I get (among other things):
> "job can run in partition DEFAULT (63 procs available.  1 procs required)"
> but the job remains in Q forever. It is not the case of a resource 
> requirement not being met (as the above message indicates)
> 
> - nothing untoward in the torque logs
> 
> - I see several of these messages in maui.log:
> MSysRegEvent(JOBCORRUPTION:  job 'jobid' has the following idle node(s) 
> allocated: 'node114' ,0,0,1)
> but these are for the running jobs, not the Q'ed jobs in question
> 
> - I also see messages like these in the maui.log:
> INFO:     PBS node node114 set to state Idle (free)
> INFO:     node 'node114' changed states from Running to Idle
> although, this node has 2 out of 4 procs busy
> this message is repeated for several nodes.
> 
> - restarting torque and maui did not help either
> 
> - if I say qrun <jobid> for the stuck jobs, I get:
> qrun: Resource temporarily unavailable <jobid>
> 
> - but if I do runjob <jobid>, the jobs are started !!
> 
> I am unable to correlate all this information. Does anyone know what can 
> be going wrong, or where else can I hunt for things?
> 
> Thanks.
> 
> -Neel


More information about the mauiusers mailing list