[torqueusers] qhold not functional
Al Taufer
ataufer at clusterresources.com
Thu Jun 12 10:49:49 MDT 2008
It seems that the code is returning an error message when it should not
be returning one.
The documentation says that for a running job if checkpoint / restart
is not supported, qhold will only set the requested hold attribute. This
will have no effect unless the job is rerun with the qrerun command.
You should be able to verify that the hold is still being placed on the
job by using 'qstat -f' and checking the Hold_Types value.
Al
Walid wrote:
> Hi All,
>
> I have installed toruqe 2.3.0 with maui, however i find that i am
> having a different behaviour when i am trying to hold jobs, qhold
> complains that the request is rejected, when i check the momlogs it
> mentions check pointing not support, i am not interested in check
> pointing, however i would like to have the ability to restart the
> jobs, any pointers would be appreciated
>
> regards
>
> Walid
>
> [root at lnx ~]# qstat -an
> lnx:
> Req'd Req'd Elap
> Job ID Username Queue Jobname SessID NDS TSK
> Memory Time S Time
> -------------------- -------- -------- ---------- ------ ----- ---
> ------ ----- - -----
> 901.lnx luser parallel STDIN 5270 1 --
> -- -- R --
> lnx512/0
> [root at lnx ~]# qhold 901
> qhold: No support for requested service MSG=MOM rejected hold request:
> 15029 901.lnx
> pbs_mom;Req;req_reject;Reject reply code=15029(No support for
> requested service REJHOST=lnx512 MSG=checkpointing not supported),
> aux=0, type=HoldJob, from PB
> S_Server at lnx
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
More information about the torqueusers
mailing list