[torqueusers] arrays and job dependencies

Naveed Near-Ansari naveed at caltech.edu
Fri May 15 16:49:23 MDT 2009


{i apologize if you receive this twice. was sent from an unsubscribed address and i don't think the first went through}

running torque 2.3.6 and maui 3.2.6p21


I am trying to reconfigure FSL for batch submission using torque.  FSL
is based on SGE for batch jobs.  The software submits a number of jobs
(some arrays) that are held on other jobs.  When the job is an array, it
seems it splits out the job id for each array element.

I submit a job that sleeps for a while:

[naveed at hostname ~]$ qsub -t 1-1 testscript
1306.hostname.caltech.edu

i then submit the same job that depends on the job id of the first:

[naveed at hostname ~]$ qsub   -W depend=afterany:1306.hostname.caltech.edu
testscript
1308.hostname.caltech.edu

and it holds indefinitely due to the job id not being valid because the
job id is actually changed from what it shown to 1306-1 due to the array

PBS Job Id: 1308.hostname.caltech.edu
Job Name:   array-hold-test
Aborted by PBS Server 
Dependency request for job rejected by 1306.hostname.caltech.edu
Unknown Job Id
Job held for unknown job dep, use 'qrls' to release

and the logs show:

05/15/2009
12:14:08;0100;PBS_Server;Job;1308.hostname.caltech.edu;enqueuing into
default, state 1 hop 1
05/15/2009
12:14:08;0080;PBS_Server;Job;1306.hostname.caltech.edu;Unknown Job Id
05/15/2009 12:14:08;0080;PBS_Server;Req;req_reject;Reject reply
code=15001(Unknown Job Id), aux=0, type=RegisterDependency, from
@hostname.caltech.edu
05/15/2009 12:14:08;0008;PBS_Server;Job;1308.hostname.caltech.edu;Job
Queued at request of naveed at hostname.caltech.edu, owner =
naveed at hostname.caltech.edu, job name = arra
y-hold-test, queue = default
05/15/2009
12:14:08;0008;PBS_Server;Job;1308.hostname.caltech.edu;Dependency
request for job rejected by 1306.hostname.caltech.edu


is it possible to hold a job on the whole array rather than an element
in the array? 


Reading through the archives show a message about a similar issue to
this, and the docs mention only qdel can act on a whole array. Have
things changed in arrays regarding this since then?

any clever ideas on how to get around this?



More information about the torqueusers mailing list