[torquedev] qhold support for job arrays

Brian O'Connor briano at sgi.com
Sun Oct 19 17:50:12 MDT 2008


Hi

Thanks Glen for your work on arrays in torque.

I can't see any mention of the new qhold code in the change log
for 2.3.3. Is it in this code base, or in 2.4 ?

We are looking for an array hold on a pre-job completion, i.e. an array
hold on a dependancy. Does the new code cater for this?

Thanks again.

Brian O'Connor
-----------------------------------------------------------------------
SGI Consulting
Email: briano at sgi.com, Mobile +61 417 746 452
Phone: +61 3 9963 1900, Fax:  +61 3 9963 1902
357 Camberwell Road, Camberwell, Victoria, 3124
AUSTRALIA
http://www.sgi.com/support/services
----------------------------------------------------------------------- 

> -----Original Message-----
> From: torquedev-bounces at supercluster.org 
> [mailto:torquedev-bounces at supercluster.org] On Behalf Of Glen Beane
> Sent: Saturday, May 10, 2008 4:33 PM
> To: torquedev at supercluster.org
> Subject: [torquedev] qhold support for job arrays
> 
> I've just added the basic qhold job array support into trunk
> 
> if you pass an array id to qhold it will now place a hold on 
> all the jobs in the array.   This isn't complete yet and 
> doesn't do the right thing if the job is running and it 
> should be checkpointed and held (in fact it just skips over 
> those jobs right now).  
> 
> If the job is running but can't be checkpointed then the hold 
> gets set but the job continues to run (this is the same 
> behavior as qhold starting in 2.4.0)
> 
> 
> 
> I am wondering about qhold for a singe job that is running 
> and no checkpointing.  On previous versions of torque the 
> user would get an error message stating that the mom does not 
> support the requested service:
> 
> qhold: No support for requested service MSG=MOM rejected hold 
> request: 15029 jobid.server
> 
> Would it still be desirable to keep this behavior for single 
> jobs when the running job can't be checkpointed and held?  
> For arrays I think I will keep quite since for large arrays 
> the user could get overwhelmed with error messages if I 
> reported every job in the array that is running and can't be 
> checkpoined 
> 
> 
> 
> 


More information about the torquedev mailing list