[torqueusers] procct and held jobs

Ken Nielson knielson at adaptivecomputing.com
Mon Oct 10 06:45:29 MDT 2011



----- Original Message -----
> From: "Gareth Williams" <Gareth.Williams at csiro.au>
> To: torqueusers at supercluster.org, moabusers at supercluster.org
> Sent: Monday, October 10, 2011 4:07:50 AM
> Subject: [torqueusers] procct and held jobs
> 
> Hi All,
> 
> We recently updated torque from 3.0.2 to 3.0.3-snap.201108261653 and
> have found that at least in some cases, if we submit a job with a
> hold (with qsub -a to run after a given time) to a routing queue,
> when the job is released and moves to an execution queue it will
> still not run because moab 6.0.2 sees a procct GRES. qstat -f shows
> a procct resource only while the job is held and in the routing
> queue.
> 
> Does anyone else with a recent torque version see this problem.  You
> can test with:
> echo sleep 300 | qsub -a `date -d 'now + 5 minutes' +'%Y%m%d%H%M'`
> 
> This should hold for 5 minutes then run and sleep for 5 minutes.
> 
> Gareth
> 
> For reference, I've worked around the issue by defining in moab a
> GLOBAL gres called procct with a large count.  The same technique
> would probably work with maui

Gareth,

That has been fixed in 2.5.8. I need to merge the fix with 3.0-fixes. I will get a snapshot when I do.

Ken


More information about the torqueusers mailing list