[torqueusers] qhold not functional

Glen Beane glen.beane at gmail.com
Thu Jun 12 10:53:21 MDT 2008


On Thu, Jun 12, 2008 at 12:49 PM, Al Taufer <ataufer at clusterresources.com>
wrote:

> It seems that the code is returning an error message when it should not be
> returning one.
>
> The documentation says  that for a running job if checkpoint / restart is
> not supported, qhold will only set the requested hold attribute. This will
> have no effect unless the job is rerun with the qrerun command.


I know this is the case in the 2.4 snapshots.  The hold does get set, and
there is no error message displayed by qhold.  Pre 2.4  torque versions
complain that the job can't be checkpointed and don't set the hold.  Which
version of the documentation says the hold will be set even if the job can't
be checkpointed?



>
>
> You should be able to verify that the hold is still being placed on the job
> by using 'qstat -f' and checking the Hold_Types value.
>
> Al
>
> Walid wrote:
>
>> Hi All,
>>
>> I have installed toruqe 2.3.0 with maui, however i find that i am having a
>> different behaviour when i am trying to hold jobs, qhold complains that the
>> request is rejected, when i check the momlogs it mentions check pointing not
>> support, i am not interested in check pointing, however i would like to have
>> the ability to restart the jobs, any pointers would be appreciated
>>
>> regards
>>
>> Walid
>>
>> [root at lnx ~]# qstat -an
>> lnx:
>>  Req'd  Req'd   Elap
>> Job ID               Username Queue    Jobname    SessID NDS   TSK Memory
>> Time  S Time
>> -------------------- -------- -------- ---------- ------ ----- --- ------
>> ----- - -----
>> 901.lnx             luser parallel STDIN        5270     1  --    --    --
>>  R   --
>>   lnx512/0
>> [root at lnx ~]# qhold 901
>> qhold: No support for requested service MSG=MOM rejected hold request:
>> 15029 901.lnx
>> pbs_mom;Req;req_reject;Reject reply code=15029(No support for requested
>> service REJHOST=lnx512 MSG=checkpointing not supported), aux=0,
>> type=HoldJob, from PB
>> S_Server at lnx
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080612/610f7669/attachment.html


More information about the torqueusers mailing list