[torqueusers] arrays and job dependencies
Naveed Near-Ansari
naveed at caltech.edu
Fri May 15 16:49:23 MDT 2009
{i apologize if you receive this twice. was sent from an unsubscribed address and i don't think the first went through}
running torque 2.3.6 and maui 3.2.6p21
I am trying to reconfigure FSL for batch submission using torque. FSL
is based on SGE for batch jobs. The software submits a number of jobs
(some arrays) that are held on other jobs. When the job is an array, it
seems it splits out the job id for each array element.
I submit a job that sleeps for a while:
[naveed at hostname ~]$ qsub -t 1-1 testscript
1306.hostname.caltech.edu
i then submit the same job that depends on the job id of the first:
[naveed at hostname ~]$ qsub -W depend=afterany:1306.hostname.caltech.edu
testscript
1308.hostname.caltech.edu
and it holds indefinitely due to the job id not being valid because the
job id is actually changed from what it shown to 1306-1 due to the array
PBS Job Id: 1308.hostname.caltech.edu
Job Name: array-hold-test
Aborted by PBS Server
Dependency request for job rejected by 1306.hostname.caltech.edu
Unknown Job Id
Job held for unknown job dep, use 'qrls' to release
and the logs show:
05/15/2009
12:14:08;0100;PBS_Server;Job;1308.hostname.caltech.edu;enqueuing into
default, state 1 hop 1
05/15/2009
12:14:08;0080;PBS_Server;Job;1306.hostname.caltech.edu;Unknown Job Id
05/15/2009 12:14:08;0080;PBS_Server;Req;req_reject;Reject reply
code=15001(Unknown Job Id), aux=0, type=RegisterDependency, from
@hostname.caltech.edu
05/15/2009 12:14:08;0008;PBS_Server;Job;1308.hostname.caltech.edu;Job
Queued at request of naveed at hostname.caltech.edu, owner =
naveed at hostname.caltech.edu, job name = arra
y-hold-test, queue = default
05/15/2009
12:14:08;0008;PBS_Server;Job;1308.hostname.caltech.edu;Dependency
request for job rejected by 1306.hostname.caltech.edu
is it possible to hold a job on the whole array rather than an element
in the array?
Reading through the archives show a message about a similar issue to
this, and the docs mention only qdel can act on a whole array. Have
things changed in arrays regarding this since then?
any clever ideas on how to get around this?
More information about the torqueusers
mailing list