[torquedev] qhold support for job arrays
Brian O'Connor
briano at sgi.com
Sun Oct 19 17:50:12 MDT 2008
Hi
Thanks Glen for your work on arrays in torque.
I can't see any mention of the new qhold code in the change log
for 2.3.3. Is it in this code base, or in 2.4 ?
We are looking for an array hold on a pre-job completion, i.e. an array
hold on a dependancy. Does the new code cater for this?
Thanks again.
Brian O'Connor
-----------------------------------------------------------------------
SGI Consulting
Email: briano at sgi.com, Mobile +61 417 746 452
Phone: +61 3 9963 1900, Fax: +61 3 9963 1902
357 Camberwell Road, Camberwell, Victoria, 3124
AUSTRALIA
http://www.sgi.com/support/services
-----------------------------------------------------------------------
> -----Original Message-----
> From: torquedev-bounces at supercluster.org
> [mailto:torquedev-bounces at supercluster.org] On Behalf Of Glen Beane
> Sent: Saturday, May 10, 2008 4:33 PM
> To: torquedev at supercluster.org
> Subject: [torquedev] qhold support for job arrays
>
> I've just added the basic qhold job array support into trunk
>
> if you pass an array id to qhold it will now place a hold on
> all the jobs in the array. This isn't complete yet and
> doesn't do the right thing if the job is running and it
> should be checkpointed and held (in fact it just skips over
> those jobs right now).
>
> If the job is running but can't be checkpointed then the hold
> gets set but the job continues to run (this is the same
> behavior as qhold starting in 2.4.0)
>
>
>
> I am wondering about qhold for a singe job that is running
> and no checkpointing. On previous versions of torque the
> user would get an error message stating that the mom does not
> support the requested service:
>
> qhold: No support for requested service MSG=MOM rejected hold
> request: 15029 jobid.server
>
> Would it still be desirable to keep this behavior for single
> jobs when the running job can't be checkpointed and held?
> For arrays I think I will keep quite since for large arrays
> the user could get overwhelmed with error messages if I
> reported every job in the array that is running and can't be
> checkpoined
>
>
>
>
More information about the torquedev
mailing list