[torqueusers] qhold not functional

Al Taufer ataufer at clusterresources.com
Thu Jun 12 10:49:49 MDT 2008


It seems that the code is returning an error message when it should not 
be returning one.

The documentation says  that for a running job if checkpoint / restart 
is not supported, qhold will only set the requested hold attribute. This 
will have no effect unless the job is rerun with the qrerun command.

You should be able to verify that the hold is still being placed on the 
job by using 'qstat -f' and checking the Hold_Types value.

Al

Walid wrote:
> Hi All,
>
> I have installed toruqe 2.3.0 with maui, however i find that i am 
> having a different behaviour when i am trying to hold jobs, qhold 
> complains that the request is rejected, when i check the momlogs it 
> mentions check pointing not support, i am not interested in check 
> pointing, however i would like to have the ability to restart the 
> jobs, any pointers would be appreciated
>
> regards
>
> Walid
>
> [root at lnx ~]# qstat -an
> lnx:                                                                  
> Req'd  Req'd   Elap
> Job ID               Username Queue    Jobname    SessID NDS   TSK 
> Memory Time  S Time
> -------------------- -------- -------- ---------- ------ ----- --- 
> ------ ----- - -----
> 901.lnx             luser parallel STDIN        5270     1  --    
> --    --  R   --
>    lnx512/0
> [root at lnx ~]# qhold 901
> qhold: No support for requested service MSG=MOM rejected hold request: 
> 15029 901.lnx
> pbs_mom;Req;req_reject;Reject reply code=15029(No support for 
> requested service REJHOST=lnx512 MSG=checkpointing not supported), 
> aux=0, type=HoldJob, from PB
> S_Server at lnx
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>   


More information about the torqueusers mailing list