[torqueusers] Jobs stuck in Queue

Bill Wichser bill at Princeton.EDU
Thu Oct 4 14:17:05 MDT 2007



Joshua Bernstein wrote:
> Interesting,
> 
> Bill Wichser wrote:
>> Are you giving it enough time to clear the data from Torque?  
>> Sometimes it takes a bit.
> 
> What would you say a "bit"? I'd imagine it would clear out after at 
> least 30 seconds, if not right away.

In my experience with Torque/PBS, a "bit" can be longer than 30 seconds.

> 
>> Also try using qsig instead of qdel for running jobs.
> 
> Whats the difference? Doesn't a qdel send a SIGKILL?
> 
> Also, the jobs are clearly getting the SIGKILL, because a ps on the node 
> shows that the jobs don't exist. I'm doing a watch ps, and I can see 
> that right after I issue the qdel, the processes begin to clean 
> themselves up and eventually disappear from the process table.

I've had much better "luck" with sending qsig to running jobs and qdel 
to those not running.  Things may have changed in recent releases but 
long ago, PBS & openPBS days, qdel just never seemed to get it all right.

Bill

> 
> -Joshua Bernstein
> Software Engineer
> Penguin Computing
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list