[torqueusers] Re: [torquedev] First Torque impressions on Altix
Michel Béland
michel.beland at rqchp.qc.ca
Tue Jan 6 22:19:35 MST 2009
Chris Samuel wrote:
>>We installed at our site version 2.3.5 of Torque, compiled with
>>--enable-cpuset on an Altix 3300 with four processors and 8 GB.
>>The machine has two nodes with two processors each. The memory
>>is also split in two between these nodes.
>>
>>
>
>Just to be clear, I'm presuming that here you mean NUMA nodes
>and not compute nodes ?
>
>Just "node" usually means the second, but I suspect you mean
>the first from our previous conversations. :-)
>
>
>
Indeed, I am talking about NUMA nodes.
>>- Thirdly, cpusets contain only cpu 0 when they are
>>launched with -lncpus instead of -lnodes.
>>
>>
>
>Oh that's ugly! I wonder if that's a symptom of a more
>general issue ?
>
>
Well, the way that Torque treats -lncpus and -lnodes is broken in some
other ways. Because of this problem, I used my qsub wrapper that I
talked about in my previous email to convert -lncpus=n to
-lnodes=1:ppn=n, to make sure that the cpusets would be correctly
created. This has an annoying consequence: when I do not use -lncpus,
qstat displays all the jobs as having one node and one task. With Maui,
showq shows the number of processors used correctly, but I think that
qstat should work too.
Some testing showed that by specifying -lnodes=1:ppn=n together with
-lncpus=n, I would get a correct cpuset and a correct qstat output. This
works with pbs_sched, but Maui thinks that I need n*n cpus!
Another problem when one does not use -lncpus is that queue limits on
ncpus are not obeyed anymore. How can I specify queue limits so that a
job with -lnodes=1:ppn=n cannot run in this queue if n is larger than 32
processors? I tried with nodect, but it does not work better.
>However, if I specify nodes=1:ppn=2 I get the expected behaviour.
>
>I'm not sure if this is a Torque bug or a Moab bug!
>
>
This is observed with pbs_sched and Maui.
>>With older versions of PBS Pro, probably more similar to
>>Torque then than it is today, using -lnodes on Altix did
>>not work quite well: memory requests were ignored. I do not
>>know if this problem would appear with Torque, though.
>>
>>
>
>Were they ignored by PBS Pro or were they getting set
>and the kernel was ignoring them ?
>
>
By PBS Pro.
>>- Fourthly, when I submit a sequential job followed
>>by a 2-cpu job, the first jobs gets a cpuset with
>>cpu 0 and the second a cpuset with cpus 1 and 2. This
>>is pretty annoying: the second job should get cpus 2 and
>>3 so that they are on the same node.
>>
>>
>
>The scheduler doesn't usually spread jobs across nodes if
>it doesn't have to, so I suspect here you mean this is
>across NUMA nodes.
>
>
>
Indeed. I fix this by not allowing sequential jobs (they get a complete
NUMA node). This way, the scheduler does not need to have any knowledge
of cpusets.
>>In fact, if cpus 0 and 2 were busy, I would expect the job
>>to remain queued.
>>
>>
>
>Hmm, I think that has to be a local site policy decision.
>
>
>
Agreed.
>>As I want to make this work with pbs_sched or Maui because
>>of budget constraints, the best way to make this work for me
>>is to make sure that all the jobs use one or more complete node.
>>
>>
>
>Do you mean a complete compute node or a complete NUMA node ?
>
>If NUMA I don't think there's any way defined to request
>then in the PBS spec..
>
>
>
That is why I need a wrapper script.
>>- Fifthly, the cpusets contain all the nodes for memory,
>>instead of just the nodes needed according to the memory
>>request.
>>
>>
>
>Correct, that's something that I want to be able to solve
>with the NUMA support.
>
>
>
>>I guess that I can probably easily change Torque to
>>restrict the memory, provided that I use the dummy
>>qsub script described above.
>>
>>
>
>Well you can restrict the memory with ulimits, but
>that won't control which NUMA node it ends up being
>allocated on..
>
>
>
>
If you write 4-7 in file mems in the cpuset, all the memory for the job
will be on these NUMA nodes, that is what I mean by "restrict".
Michel Béland
Réseau québécois de calcul de haute performance
More information about the torquedev
mailing list