[torqueusers] procct and held jobs

Ken Nielson knielson at adaptivecomputing.com
Mon Oct 10 14:00:23 MDT 2011


----- Original Message -----
> From: "Ken Nielson" <knielson at adaptivecomputing.com>
> To: "Torque Users Mailing List" <torqueusers at supercluster.org>
> Cc: moabusers at supercluster.org
> Sent: Monday, October 10, 2011 6:45:29 AM
> Subject: Re: [torqueusers] procct and held jobs
> 
> 
> 
> ----- Original Message -----
> > From: "Gareth Williams" <Gareth.Williams at csiro.au>
> > To: torqueusers at supercluster.org, moabusers at supercluster.org
> > Sent: Monday, October 10, 2011 4:07:50 AM
> > Subject: [torqueusers] procct and held jobs
> > 
> > Hi All,
> > 
> > We recently updated torque from 3.0.2 to 3.0.3-snap.201108261653
> > and
> > have found that at least in some cases, if we submit a job with a
> > hold (with qsub -a to run after a given time) to a routing queue,
> > when the job is released and moves to an execution queue it will
> > still not run because moab 6.0.2 sees a procct GRES. qstat -f shows
> > a procct resource only while the job is held and in the routing
> > queue.
> > 
> > Does anyone else with a recent torque version see this problem.
> >  You
> > can test with:
> > echo sleep 300 | qsub -a `date -d 'now + 5 minutes' +'%Y%m%d%H%M'`
> > 
> > This should hold for 5 minutes then run and sleep for 5 minutes.
> > 
> > Gareth
> > 
> > For reference, I've worked around the issue by defining in moab a
> > GLOBAL gres called procct with a large count.  The same technique
> > would probably work with maui
> 
> Gareth,
> 
> That has been fixed in 2.5.8. I need to merge the fix with 3.0-fixes.
> I will get a snapshot when I do.
>

I was wrong. This did make it into the snapshot. This is another case where procct is passed up to the scheduler. Another bug.

Ken


More information about the torqueusers mailing list