[torqueusers] dependencies on completed jobs
nathaniel.x.woody at gsk.com
nathaniel.x.woody at gsk.com
Tue Sep 5 16:03:12 MDT 2006
I think the situation Sam has is slightly different and I can confirm
(practically and with a simple test) the problem. The difference is to
wait 2 minutes between the submission of 81384 and 81385.
This stems from something I've griped about before, in order for a
dependency to be recognized correctly by Torque, the job that is being
depended on must currently be in the queue (I could be off here, I'm not
sure what the all of the legal states are, but I don't believe C is one of
them). If the jobid dependency isn't currently in the queue, the
submitted job get's held. I'm not willing to say what the correct
behavior is there, though.
Nate
"Garrick Staples" <garrick at clusterresources.com>
Sent by: torqueusers-bounces at supercluster.org
05-Sep-2006 17:39
To
torqueusers at supercluster.org
cc
Subject
Re: [torqueusers] dependencies on completed jobs
On Tue, Sep 05, 2006 at 12:45:27PM -0700, Sam Rash alleged:
> So we've noticed that if we submit job A, then submit job B which
depends on
> A (-W depend=afterok:B_job_id), and A has already completed (we have
> keep_completed set to at least 30 min), B get stuck in the hold state.
Is
> this intentional? Or a bug?
>
>
>
> It seems like B should surely run.
>
> (maybe A updates its dependents when it completes and B won't check
> explicitly?)
The simple test works fine for me.
[garrick at hpcjr-master garrick]$ echo sleep 60 | qsub
81384.hpcjr-master.usc.edu
[garrick at hpcjr-master garrick]$ echo sleep 60 | qsub -W
depend=afterok:81384
81385.hpcjr-master.usc.edu
[garrick at hpcjr-master garrick]$ echo sleep 60 | qsub -W
depend=afterok:81385
81386.hpcjr-master.usc.edu
After 1.5 minutes:
81384.hpcjr-master.u garrick batch STDIN 10643 1 -- --
01:00 C 00:00
81385.hpcjr-master.u garrick batch STDIN 10804 1 -- --
01:00 R --
81386.hpcjr-master.u garrick batch STDIN -- 1 -- --
01:00 H --
And after 2.5 minutes:
81384.hpcjr-master.u garrick batch STDIN 10643 1 -- --
01:00 C 00:00
81385.hpcjr-master.u garrick batch STDIN 10804 1 -- --
01:00 C 00:00
81386.hpcjr-master.u garrick batch STDIN 10910 1 -- --
01:00 R --
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060905/d5436793/attachment.html
More information about the torqueusers
mailing list