[torqueusers] Jobs stuck in Queue
Joshua Bernstein
jbernstein at penguincomputing.com
Thu Oct 4 14:35:18 MDT 2007
Bill Wichser wrote:
>
>
> Joshua Bernstein wrote:
>> Interesting,
>>
>> Bill Wichser wrote:
>>> Are you giving it enough time to clear the data from Torque?
>>> Sometimes it takes a bit.
>>
>> What would you say a "bit"? I'd imagine it would clear out after at
>> least 30 seconds, if not right away.
>
> In my experience with Torque/PBS, a "bit" can be longer than 30 seconds.
Interesting... You'd think its would happen right away.
>>
>>> Also try using qsig instead of qdel for running jobs.
>>
>> Whats the difference? Doesn't a qdel send a SIGKILL?
>>
>> Also, the jobs are clearly getting the SIGKILL, because a ps on the
>> node shows that the jobs don't exist. I'm doing a watch ps, and I can
>> see that right after I issue the qdel, the processes begin to clean
>> themselves up and eventually disappear from the process table.
>
> I've had much better "luck" with sending qsig to running jobs and qdel
> to those not running. Things may have changed in recent releases but
> long ago, PBS & openPBS days, qdel just never seemed to get it all right.
Doesn't seem to work with qsig either. Seems my luck isn't as good :-(
-Josh
More information about the torqueusers
mailing list