[torquedev] nodes, procs, tpn and ncpus

"Mgr. Šimon Tóth" SimonT at mail.muni.cz
Wed Jun 9 08:05:36 MDT 2010


> Currently when TORQUE is asked to run a job with qrun it interprets the nodes=x as only a single node. Glen, if you look at listelem and node_spec you will see this is the case. TORQUE also ignores procs and ncpus.

Torque ignores any resource request (-l resource=value) when determining
where will a job run.

Torque also ignores any such request inside nodespec with the exception
of ppn.

For example: -l nodes=vmem=10G will just never run.

> I am going to modify TORQUE so it will process these resources more like we expect.

If you wait one week I will provide full patch for generic resources.

> procs=x will mean give me x processors anywhere.

You mean -l procs=x?

> nodes=x will mean the same as procs=x.

I don't agree. -l nodes=x means give me X nodes.

> nodes=x:ppn=x will work as it currently does except that the value for nodes will not be ignored.

It isn't ignored now. -l nodes=x:ppn=y means give me X nodes with Y
processes each.

> That is a node spec of -l nodes=2:ppn=2 will look for two nodes with two available processors. This can be satisfied on the same host or different hosts. Currently this node spec will only get two processors on a single node.

Wrong, currently this spec gets two nodes with two processes each, and
that's the way it should work.

> ncpus=x will allocate x processors to a single task. They must be on the same host.

Ncpus should be a generic node resource, counted as any generic node
resource.

> tpn can be used like ppn and it will be interpreted to mean use exactly x processors from each node.
> A node spec of nodes=2:tpn=2 will allocate two processors on one node and two processors on a separate node.

I wouldn't go that far. We are considering something like joining
requests when they can be satisfied on one node, but this should work
generically for any resource and set by some global flag (similar to
#shared).

Let say you want two nodes with 4 cpus each and 4G memory, but you don't
mind being assigned one node satisfying the request as whole, then you
would write:

-l nodes=2:ncpus=4:mem=4G#join (or something similar)

-- 
Mgr. Šimon Tóth

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100609/bf219bb4/attachment.bin 


More information about the torquedev mailing list