[torquedev] [torqueusers] Problem with ppn and routing

David Singleton David.Singleton at anu.edu.au
Mon Dec 6 16:14:27 MST 2010


On 12/07/2010 09:28 AM, "Mgr. Šimon Tóth" wrote:
>> This does not corresponds to my definition of "at all". What else should
>> Torque do with a resource in order that one can consider that it views
>> it? Pbs_mom passes to the server how much mem is used locally and the
>> server sums up the result. The server understands mem enough so that it
>> can route jobs according to the value requested. What else should it do?

Actually, the server doesn't sum it up, the MS does before sending it on to
the server.   The server really does treat all resources (except nodes) s
abstract and attaches no semantics to them.  The MOM, on the other hand,
obviously does attach meaning and last time I looked (about 2.3.3) it attached
two contradictory meanings in the case of vmem.  In encode_used(), it sums
vmem from sisters to send a total job usage to the server. In mom_over_limit(),
each sister applies the vmem request as the per node limit.

In our code, we treat vmem (and mem) as whole job measure but apply limits
at the node level "pro rata" according to cpu allocations, i.e. a node with
fraction X of the job cpus gets fraction X of the memory allocation.

>
> I should have formulated that better. Checking requests against absolute
> limits (and routing jobs according to those limits) is supported. What
> isn't supported is scheduling jobs on nodes according to requested
> resources.
>
> -l nodes=1:mem=100GB or -l nodes=1 -l mem=100GB will run on any node.
>


Can we avoid the "should the server be able to schedule" debate for a
minute?

I dont think the first of these has any meaning right now - I think the
":mem=100GB" is treated as a node property.   But I like the idea, it
would allow us to have non-uniform memory requests across nodes. It seems
to me that -lnodes=2:mem=100MB should be equivalent to -lmem=200MB,nodes=2.
Possibly the first should have a syntax like -lnodes=2:mempn=100MB to make
it less ambiguous.

David


More information about the torquedev mailing list