[torqueusers] Torque 4.0 and job arrays

Ken Nielson knielson at adaptivecomputing.com
Tue Apr 24 09:07:24 MDT 2012


On Tue, Apr 24, 2012 at 12:23 AM, Rhys Hill <rhys.hill at adelaide.edu.au>wrote:

> Hi David,
>
> Thanks for that. I've just found and fixed some other bugs which I've
> added to
> bugzilla. The one issue that remains is odd. It seems that we have a
> situation
> where an array is stuck, when all of it's component jobs are finished.
>
> For instance, qstat -f says this:
>
> Job Id: 678[].moby.cs.adelaide.edu.au
>    Job_Name = YZ_Oxford_group
>    Job_Owner = yanzhichen at moby.cs.adelaide.edu.au
>    job_state = Q
>    queue = batch
>    server = moby.cs.adelaide.edu.au
>    Checkpoint = u
>    ctime = Tue Apr 24 09:26:10 2012
>    Error_Path = moby.cs.adelaide.edu.au:
> /home/yanzhichen/moby/oxbuilding_voca
>        bulary/out.e.txt
>    Hold_Types = n
>    Join_Path = n
>    Keep_Files = n
>    Mail_Points = a
>    mtime = Tue Apr 24 09:26:10 2012
>    Output_Path = moby.cs.adelaide.edu.au:
> /home/yanzhichen/moby/oxbuilding_voc
>        abulary/out.o.txt
>    Priority = 0
>    qtime = Tue Apr 24 09:26:10 2012
>    Rerunable = True
>    Resource_List.mem = 5gb
>    Resource_List.nodect = 1
>    Resource_List.nodes = 1:ppn=1
>    Resource_List.pmem = 5gb
>    Resource_List.pvmem = 8gb
>    Resource_List.walltime = 48:00:00
>    etime = Tue Apr 24 09:26:10 2012
>    submit_args = -t 2-11 ./job_dogroup
>    job_array_request = 2-11
>    fault_tolerant = False
>    job_radix = 0
>    submit_host = moby.cs.adelaide.edu.au
>    init_work_dir = /home/yanzhichen/moby/oxbuilding_vocabulary
>
> whereas qstat -ft has no mention of 678[x] at all. qdel and qdel -p have
> no effect
> on jobs like these. I think I've submitted a fix for the problem that
> leads to the
> job getting into this state, but it would be handy if qdel could remove it.
>
> Rhys,

To delete an element of the array or list all of the elements in an array
you need to use the -t option. For example qstat -t will not only list the
array master but every job in the array and its current state.

qdel is the same. You need to use qdel -t to delete an individual job in
the array.

Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120424/2420e2b5/attachment.html 


More information about the torqueusers mailing list