[torquedev] qhold support for job arrays

Glen Beane glen.beane at gmail.com
Sat May 10 00:32:46 MDT 2008


I've just added the basic qhold job array support into trunk

if you pass an array id to qhold it will now place a hold on all the jobs in
the array.   This isn't complete yet and doesn't do the right thing if the
job is running and it should be checkpointed and held (in fact it just skips
over those jobs right now).

If the job is running but can't be checkpointed then the hold gets set but
the job continues to run (this is the same behavior as qhold starting in
2.4.0)



I am wondering about qhold for a singe job that is running and no
checkpointing.  On previous versions of torque the user would get an error
message stating that the mom does not support the requested service:

qhold: No support for requested service MSG=MOM rejected hold request: 15029
jobid.server

Would it still be desirable to keep this behavior for single jobs when the
running job can't be checkpointed and held?  For arrays I think I will keep
quite since for large arrays the user could get overwhelmed with error
messages if I reported every job in the array that is running and can't be
checkpoined
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20080510/2354c915/attachment.html


More information about the torquedev mailing list