[torqueusers] dependencies on completed jobs

nathaniel.x.woody at gsk.com nathaniel.x.woody at gsk.com
Tue Sep 5 16:03:12 MDT 2006


I think the situation Sam has is slightly different and I can confirm 
(practically and with a simple test) the problem.  The difference is to 
wait 2 minutes between the submission of 81384 and 81385.

This stems from something I've griped about before, in order for a 
dependency to be recognized correctly by Torque, the job that is being 
depended on must currently be in the queue  (I could be off here, I'm not 
sure what the all of the legal states are, but I don't believe C is one of 
them).  If the jobid dependency isn't currently in the queue, the 
submitted job get's held.  I'm not willing to say what the correct 
behavior is there, though.

Nate








"Garrick Staples" <garrick at clusterresources.com> 
Sent by: torqueusers-bounces at supercluster.org
05-Sep-2006 17:39
 
To
torqueusers at supercluster.org
cc

Subject
Re: [torqueusers] dependencies on completed jobs






On Tue, Sep 05, 2006 at 12:45:27PM -0700, Sam Rash alleged:
> So we've noticed that if we submit job A, then submit job B which 
depends on
> A (-W depend=afterok:B_job_id), and A has already completed (we have
> keep_completed set to at least 30 min), B get stuck in the hold state. 
Is
> this intentional?  Or a bug?
> 
> 
> 
> It seems like B should surely run.
> 
> (maybe A updates its dependents when it completes and B won't check
> explicitly?)

The simple test works fine for me.

[garrick at hpcjr-master garrick]$ echo sleep 60 | qsub
81384.hpcjr-master.usc.edu
[garrick at hpcjr-master garrick]$ echo sleep 60 | qsub -W 
depend=afterok:81384
81385.hpcjr-master.usc.edu
[garrick at hpcjr-master garrick]$ echo sleep 60 | qsub -W 
depend=afterok:81385
81386.hpcjr-master.usc.edu

After 1.5 minutes:
81384.hpcjr-master.u garrick  batch    STDIN       10643     1  --    -- 
01:00 C 00:00
81385.hpcjr-master.u garrick  batch    STDIN       10804     1  --    -- 
01:00 R   --
81386.hpcjr-master.u garrick  batch    STDIN         --      1  --    -- 
01:00 H   --

And after 2.5 minutes:
81384.hpcjr-master.u garrick  batch    STDIN       10643     1  --    -- 
01:00 C 00:00
81385.hpcjr-master.u garrick  batch    STDIN       10804     1  --    -- 
01:00 C 00:00
81386.hpcjr-master.u garrick  batch    STDIN       10910     1  --    -- 
01:00 R   --


_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060905/d5436793/attachment.html


More information about the torqueusers mailing list