[torqueusers] Jobs stuck in Queue

Joshua Bernstein jbernstein at penguincomputing.com
Thu Oct 4 14:35:18 MDT 2007



Bill Wichser wrote:
> 
> 
> Joshua Bernstein wrote:
>> Interesting,
>>
>> Bill Wichser wrote:
>>> Are you giving it enough time to clear the data from Torque?  
>>> Sometimes it takes a bit.
>>
>> What would you say a "bit"? I'd imagine it would clear out after at 
>> least 30 seconds, if not right away.
> 
> In my experience with Torque/PBS, a "bit" can be longer than 30 seconds.

Interesting... You'd think its would happen right away.

>>
>>> Also try using qsig instead of qdel for running jobs.
>>
>> Whats the difference? Doesn't a qdel send a SIGKILL?
>>
>> Also, the jobs are clearly getting the SIGKILL, because a ps on the 
>> node shows that the jobs don't exist. I'm doing a watch ps, and I can 
>> see that right after I issue the qdel, the processes begin to clean 
>> themselves up and eventually disappear from the process table.
> 
> I've had much better "luck" with sending qsig to running jobs and qdel 
> to those not running.  Things may have changed in recent releases but 
> long ago, PBS & openPBS days, qdel just never seemed to get it all right.

Doesn't seem to work with qsig either. Seems my luck isn't as good :-(

-Josh


More information about the torqueusers mailing list