[Mauiusers] Re: [torqueusers] Job eligible, nodes free,
but job would not start
Neelesh Arora
narora at Princeton.EDU
Fri Oct 13 10:43:00 MDT 2006
An update:
I notice that when these jobs are stuck, one way to get them started is
to set a walltime (using qalter) less than the default walltime. We set
a default_walltime of 9999:00:00 at the server level and require the
users to specify the needed cpu-time.
This was set a long time ago and has not been causing any issues. But it
seems now that if you have set this default and then a user submits a
job with an explicit -l walltime=<time> specification, then that job
runs while older jobs with default walltime wait.
Can some one please shed some light on this - I am out of clues here?
Thanks.
-Neel
Neelesh Arora wrote:
> Hi All,
>
> I am using torque-2.0.0p2 and maui-3.2.6p13, and notice the following
> behavior today:
>
> - There are several jobs in the queue that are in the Q state. When I do
> checkjob <jobid>, I get (among other things):
> "job can run in partition DEFAULT (63 procs available. 1 procs required)"
> but the job remains in Q forever. It is not the case of a resource
> requirement not being met (as the above message indicates)
>
> - nothing untoward in the torque logs
>
> - I see several of these messages in maui.log:
> MSysRegEvent(JOBCORRUPTION: job 'jobid' has the following idle node(s)
> allocated: 'node114' ,0,0,1)
> but these are for the running jobs, not the Q'ed jobs in question
>
> - I also see messages like these in the maui.log:
> INFO: PBS node node114 set to state Idle (free)
> INFO: node 'node114' changed states from Running to Idle
> although, this node has 2 out of 4 procs busy
> this message is repeated for several nodes.
>
> - restarting torque and maui did not help either
>
> - if I say qrun <jobid> for the stuck jobs, I get:
> qrun: Resource temporarily unavailable <jobid>
>
> - but if I do runjob <jobid>, the jobs are started !!
>
> I am unable to correlate all this information. Does anyone know what can
> be going wrong, or where else can I hunt for things?
>
> Thanks.
>
> -Neel
More information about the mauiusers
mailing list