[torqueusers] Removing the "exec_host" attribute from a queued
job ?
Wolfgang Wander
wwc at rentec.com
Tue Sep 20 04:36:14 MDT 2005
Simon Robbins writes:
>
> Hello,
>
> On Tue, 20 Sep 2005, Chris Samuel wrote:
>
> > Hi folks,
> >
> > I've got a job that's queued and obviously tried to start and failed and has
> > ended up with the following attribute set on it:
> >
> > exec_host = edda010/0+edda007/3+edda007/2+edda007/1
> >
> > I suspect it's stopping Moab or Torque from running it again on other nodes,
> > and I'd like to clear that attribute, but it doesn't appear to be accessible
> > through qalter or qmgr.
> >
> > Any clues ?
>
> Unfortunately no. I have been seeing this behaviour
> for months now with torque_1.2.0p2,4,5 and 6. From
> Maui I get:
> HostList:
> [n504:1]
> Messages: cannot start job - RM failure, rc: 15041, msg:
> 'Execution server rejected request MSG=send failed, STARTING'
>
> Sometimes this is associated with a failure in the
> network.
>
I've noticed that you can qrun -H [free-node] jobid the job.
You'll have to find a [free-node] manually though to make this
work...
Wolfgang
More information about the torqueusers
mailing list