[torquedev] nodes, procs, tpn and ncpus
"Mgr. Šimon Tóth"
SimonT at mail.muni.cz
Wed Jun 9 08:39:31 MDT 2010
Dne 9.6.2010 16:34, Glen Beane napsal(a):
> On Wed, Jun 9, 2010 at 10:24 AM, Bas van der Vlies <basv at sara.nl> wrote:
>> On 09-06-10 16:00, Glen Beane wrote:
>>> On Wed, Jun 9, 2010 at 9:45 AM, Glen Beane<glen.beane at gmail.com> wrote:
>>>> On Wed, Jun 9, 2010 at 8:57 AM, Ken Nielson
>>>> <knielson at adaptivecomputing.com> wrote:
>>>>> Currently when TORQUE is asked to run a job with qrun it interprets the nodes=x as only a single node. Glen, if you look at listelem and node_spec you will see this is the case. TORQUE also ignores procs and ncpus.
>>>> If I stop Moab on my cluster and run jobs with qrun nodes=x is not
>>>> interpreted as a single node. It is basically interpreted as
>>>> gbeane at wulfgar:~> echo "sleep 60" | qsub -l nodes=4,walltime=00:02:00
>>>> gbeane at wulfgar:~> qrun 69795
>>>> gbeane at wulfgar:~> qstat -f 69795
>>>> exec_host = cs-prod-6/0+cs-prod-5/0+cs-prod-4/0+cs-prod-3/0
>>>> Resource_List.neednodes = 4
>>>> Resource_List.nodect = 4
>>>> Resource_List.nodes = 4
>>> by the way, I was just looking through an old OpenPBS manual and from
>>> what I could tell if you requested a number of nodes without
>>> specifying ppn you would be given complete nodes. I don't have a
>>> problem with TORQUE assuming ppn=1 if it is not specified though.
>> From the openpbs v2_2.ers.pdf manual:
>> * nodes: Number and/or type of nodes to be reserved for exclusive use by
>> the job.
>> . To ask for 12 nodes of any type: -l nodes=12
>> . To ask for 2 processors on each of four nodes:
>> -l nodes=4:ppn=2
>> . To ask for 4 processors on one node:
>> -l nodes=1:ppn=4
> nodes hasn't meant exclusive in a long time, I don't suggest we revert
> back to that
It is something that should be definitely achievable.
Exclusive access to node is a very common request (for I/O, network,
memory heavy jobs). The workarounds (when you request all nodes
resources) simply don't work in a heterogeneous environment.
I have currently all nodes configured to have only one process so they
actually correctly interpret the shared/exclusive states. Luckily that's
OK for me cause I have a patch for generic resources on nodes, which
handles the rest (cpus, memory, etc...).
Mgr. Šimon Tóth
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100609/cee71385/attachment-0001.bin
More information about the torquedev