[torqueusers] Torque's scheduling problem

Garrick Staples garrick at usc.edu
Tue Oct 18 14:01:45 MDT 2005


On Tue, Oct 18, 2005 at 01:42:17PM -0600, Dave Jackson alleged:
> Garrick,
> 
>   This is the way it has always 'worked' since early OpenPBS days.  We
> assumed that 'rerun' was supposed to just 'requeue' the job.  If this is
> in fact broken, fixing it will break a number of tools which depend on
> the 'broken' behavior.  Currently, to get a rerun affect, we use qrerun
> followed by qrun.

I think I see the issue here.

pbs_server initially sends a SignalJob to MS to kill the processes, and
MOM sends back the resultant JobObit.  The problem is that pbs_server is
responding to the JobObit with DeleteJob _and_ sends a RerunJob.  


-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051018/ae6a6bfb/attachment.bin


More information about the torqueusers mailing list