[torqueusers] ppn + hostlist

Martin Siegert siegert at sfu.ca
Wed Dec 4 13:19:21 MST 2013


Hi,

Yup. We have seen this as well.
-l nodes=node1+node2
allocates two cores on node1. But
-l nodes=node1+1
actually gives you 1 core on node1 and another core on a different node.
It is my understanding that this is a moab bug.

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
Simon Fraser University

On Wed, Dec 04, 2013 at 02:51:46PM -0500, Matt Britt wrote:
> 
>    I don't know if it is related, but there is an issue w/ Moab asking for
>    multiple named nodes in node exclusive mode.  Interestingly, it is the
>    opposite problem from what Brian mentions - tasks are combined down to
>    the first named node.
>    #PBS -l nodes=node1+node2
>    #PBS -n
>    Checkjob would return something like:
>    Allocated Nodes:
>    [node1:2]
> 
>    --------------------------------------------
>    Matthew Britt
>    CAEN HPC Group - College of Engineering
>    [1]msbritt at umich.edu
> 
>    On Wed, Dec 4, 2013 at 1:09 PM, Glen Beane <[2]glen.beane at gmail.com>
>    wrote:
> 
>    I'm guessing this is a Moab issue.  As far as I know, Torque by itself
>    has never supported what you are trying to do  (and the fact that
>    nodes=compute-3-1:ppn=2+compute-3-5:ppn=2+compute-7-1:ppn=2+compute-7-3
>    :ppn=2+compute-3-7:ppn=2 does not work indicates that Moab is doing
>    doing something strange to the resource request)
>    On Wed, Dec 4, 2013 at 12:38 PM, Andrus, Brian Contractor
>    <[3]bdandrus at nps.edu> wrote:
> 
>    Glen,
> 
>    
> 
>    Thanks for the clarity, however that still doesnt work.
> 
>    
> 
>    qsub -I -l
>    nodes=compute-3-1:ppn=2+compute-3-5:ppn=2+compute-7-1:ppn=2+compute-7-3
>    :ppn=2+compute-3-7:ppn=2
> 
>    
> 
>    The job shows up waiting to run and checkjob tells me:
> 
>    
> 
>    Req[0]  TaskCount: 8  Partition: ALL
> 
>    Opsys: ---  Arch: ---  Features: compute-3-1
> 
>    Dedicated Resources Per Task: PROCS: 1  MEM: 1024M
> 
>    Required HostList:
> 
>    [compute-3-5:2][compute-7-1:2][compute-7-3:2][compute-3-7:2]
> 
>    
> 
>    So the HostList is comprised of all the listed nodes EXCEPT the first
>    one, which gets tagged as a feature.
> 
>    
> 
>    Also that seems to require I specify exactly everything I need.
> 
>    The scenario I am working with is: I want 2 nodes with 2 ppn, but they
>    can come from any node(s) from a list of several.
> 
>    Using the correct syntax, I would end up with 2 procs on each node
>    listed.
> 
>    
> 
>    I see a similar issue with trying to use procs with a nodelist.
> 
>    Not sure if that is possible, but apparently users had been doing that
>    too:
> 
>    
> 
>    qsub -I -l procs=32 -l
>    nodes=compute-3-1+compute-3-5+compute-7-1+compute-7-3+compute-3-7
> 
>    
> 
>    This again creates a HostList of all but the first named node, which is
>    tagged as a feature requirement
> 
>    
> 
>    Seems like something is parsing stuff oddly.
> 
>    
> 
>    Brian Andrus
> 
>    ITACS/Research Computing
> 
>    Naval Postgraduate School
> 
>    Monterey, California
> 
>    voice: [4]831-656-6238
> 
>    
> 
>    
> 
>    
> 
>    From: [5]torqueusers-bounces at supercluster.org
>    [mailto:[6]torqueusers-bounces at supercluster.org] On Behalf Of
>    [7]glen.beane at gmail.com
>    Sent: Wednesday, December 04, 2013 6:03 AM
>    To: Torque Users Mailing List
>    Subject: Re: [torqueusers] ppn + hostlist
> 
>    
> 
>    The correct syntax has always been
>    nodes=node01:ppn=2+node02:ppn=2+mode03:ppn=2
>    Sent from my iPhone
> 
>    On Dec 4, 2013, at 2:12 AM, "Andrus, Brian Contractor"
>    <[8]bdandrus at nps.edu> wrote:
> 
>    All,
> 
>    
> 
>    Something seems to have changed either in torque or moab (I am thinking
>    moab).
> 
>    
> 
>    If I want to request 2 nodes with 2 ppn from a particular hostlist, we
>    used to:
> 
>    qsub -l nodes=2:ppn=2:node01+node02+node03
> 
>    
> 
>    But now that does not work. It errors with:
> 
>    qsub: submit error (Job rejected by all possible destinations (check
>    syntax, queue resources, ...))
> 
>    
> 
>    However I can:
> 
>    qsub -l nodes=2:ppn=2 -l nodes=node01+node02+node03
> 
>    
> 
>    But that gives me 3 procs from those available in the nodelist
>    (basically it ignores the first -l directive)
> 
>    
> 
>    And if I try:
> 
>    qsub -l nodes=1:node01+node02+node03
> 
>    
> 
>    It ends up putting the node names as a required features, which of
>    course no single node has all of o.O
> 
>    Such jobs end up never running until I force them with qrun
> 
>    
> 
>    So, how do I request  X nodes with Y PPN from resources constrained to
>    a particular list of hosts?
> 
>    
> 
>    The current use case is for users that want to ensure their job lands
>    on the same nodes as they to timing comparisons.
> 
>    
> 
>    This is torque 4.2.6 and moab 7.2.6
> 
>    
> 
>    Brian Andrus
> 
>    ITACS/Research Computing
> 
>    Naval Postgraduate School
> 
>    Monterey, California
> 
>    voice: [9]831-656-6238


More information about the torqueusers mailing list