[torqueusers] Re: [torquedev] First Torque impressions on Altix

Michel Béland michel.beland at rqchp.qc.ca
Tue Jan 6 22:19:35 MST 2009


Chris Samuel wrote:

>>We installed at our site version 2.3.5 of Torque, compiled with
>>--enable-cpuset on an Altix 3300 with four processors and 8 GB.
>>The machine has two nodes with two processors each. The memory
>>is also split in two between these nodes.
>>    
>>
>
>Just to be clear, I'm presuming that here you mean NUMA nodes
>and not compute nodes ?
>
>Just "node" usually means the second, but I suspect you mean
>the first from our previous conversations. :-)
>
>  
>

Indeed, I am talking about NUMA nodes.

>>- Thirdly, cpusets contain only cpu 0 when they are
>>launched with -lncpus instead of -lnodes.
>>    
>>
>
>Oh that's ugly!  I wonder if that's a symptom of a more
>general issue ?
>  
>

Well, the way that Torque treats -lncpus and -lnodes is broken in some 
other ways. Because of this problem, I used my qsub wrapper that I 
talked about in my previous email to convert -lncpus=n to 
-lnodes=1:ppn=n, to make sure that the cpusets would be correctly 
created. This has an annoying consequence: when I do not use -lncpus, 
qstat displays all the jobs as having one node and one task. With Maui, 
showq shows the number of processors used correctly, but I think that 
qstat should work too.

Some testing showed that by specifying -lnodes=1:ppn=n together with 
-lncpus=n, I would get a correct cpuset and a correct qstat output. This 
works with pbs_sched, but Maui thinks that I need n*n cpus!

Another problem when one does not use -lncpus is that queue limits on 
ncpus are not obeyed anymore. How can I specify queue limits so that a 
job with -lnodes=1:ppn=n cannot run in this queue if n is larger than 32 
processors? I tried with nodect, but it does not work better.

>However, if I specify nodes=1:ppn=2 I get the expected behaviour.
>
>I'm not sure if this is a Torque bug or a Moab bug!
>  
>

This is observed with pbs_sched and Maui.

>>With older versions of PBS Pro, probably more similar to
>>Torque then  than it is today, using -lnodes on Altix did
>>not work quite well: memory requests were ignored. I do not
>>know if this problem would appear with Torque, though.
>>    
>>
>
>Were they ignored by PBS Pro or were they getting set
>and the kernel was ignoring them ?
>  
>

By PBS Pro.

>>- Fourthly, when I submit a sequential job followed
>>by a 2-cpu job, the first jobs gets a cpuset with
>>cpu 0 and the second a cpuset with cpus 1 and 2. This
>>is pretty annoying: the second job should get cpus 2 and
>>3 so that they are on the same node.
>>    
>>
>
>The scheduler doesn't usually spread jobs across nodes if
>it doesn't have to, so I suspect here you mean this is
>across NUMA nodes.
>
>  
>

Indeed. I fix this by not allowing sequential jobs (they get a complete 
NUMA node). This way, the scheduler does not need to have any knowledge 
of cpusets.

>>In fact, if cpus 0 and 2 were busy, I would expect the job
>>to remain queued.
>>    
>>
>
>Hmm, I think that has to be a local site policy decision.
>
>  
>

Agreed.

>>As I want to make this work with pbs_sched or Maui because
>>of budget constraints, the best way to make this work for me
>>is to make sure that all the jobs use one or more complete node.
>>    
>>
>
>Do you mean a complete compute node or a complete NUMA node ?
>
>If NUMA I don't think there's any way defined to request
>then in the PBS spec..
>
>  
>

That is why I need a wrapper script.

>>- Fifthly, the cpusets contain all the nodes for memory,
>>instead of just the nodes needed according to the memory
>>request.
>>    
>>
>
>Correct, that's something that I want to be able to solve
>with the NUMA support.
>
>  
>
>>I guess that I can probably easily change Torque to
>>restrict the memory, provided that I use the dummy
>>qsub script described above.
>>    
>>
>
>Well you can restrict the memory with ulimits, but
>that won't control which NUMA node it ends up being
>allocated on..
>
>
>  
>

If you write 4-7 in file mems in the cpuset, all the memory for the job 
will be on these NUMA nodes, that is what I mean by "restrict".

Michel Béland
Réseau québécois de calcul de haute performance


More information about the torquedev mailing list