[torqueusers] procct and held jobs
knielson at adaptivecomputing.com
Mon Oct 10 14:00:23 MDT 2011
----- Original Message -----
> From: "Ken Nielson" <knielson at adaptivecomputing.com>
> To: "Torque Users Mailing List" <torqueusers at supercluster.org>
> Cc: moabusers at supercluster.org
> Sent: Monday, October 10, 2011 6:45:29 AM
> Subject: Re: [torqueusers] procct and held jobs
> ----- Original Message -----
> > From: "Gareth Williams" <Gareth.Williams at csiro.au>
> > To: torqueusers at supercluster.org, moabusers at supercluster.org
> > Sent: Monday, October 10, 2011 4:07:50 AM
> > Subject: [torqueusers] procct and held jobs
> > Hi All,
> > We recently updated torque from 3.0.2 to 3.0.3-snap.201108261653
> > and
> > have found that at least in some cases, if we submit a job with a
> > hold (with qsub -a to run after a given time) to a routing queue,
> > when the job is released and moves to an execution queue it will
> > still not run because moab 6.0.2 sees a procct GRES. qstat -f shows
> > a procct resource only while the job is held and in the routing
> > queue.
> > Does anyone else with a recent torque version see this problem.
> > You
> > can test with:
> > echo sleep 300 | qsub -a `date -d 'now + 5 minutes' +'%Y%m%d%H%M'`
> > This should hold for 5 minutes then run and sleep for 5 minutes.
> > Gareth
> > For reference, I've worked around the issue by defining in moab a
> > GLOBAL gres called procct with a large count. The same technique
> > would probably work with maui
> That has been fixed in 2.5.8. I need to merge the fix with 3.0-fixes.
> I will get a snapshot when I do.
I was wrong. This did make it into the snapshot. This is another case where procct is passed up to the scheduler. Another bug.
More information about the torqueusers