[torqueusers] numa problems

David Beer dbeer at adaptivecomputing.com
Fri Sep 30 09:14:45 MDT 2011



----- Original Message -----
> Hello everyone!
> I sent this message before, but i don't know if it arrived correctly,
> so i'll try again. (sorry if this is a dupe)
> 
> 
> we're just starting out with torque, but we've run into a problem. We
> have a 48-core AMD system (4 sockets with 12 cores each). The linux
> system sees this as 8 nodes with 6 cores each.
> I've tried compiling torque 3.02 with --enable-cpuset and
> --enable-numa-support. (i also tried without cpuset, but the result
> was
> the same, i even got an error telling me i had to mount /dev/cpuset,
> even without this switch???).

Numa support uses cpusets for its implementation, so yes, you'll get the same result whether or not you use the --enable-cpuset switch. You will definitely need to mount cpusets in order to get things working.

> Anyway, our mom.layout looks like this:
> 
> cpus=0,4,8,12,16,20    mem=0
> cpus=24,28,32,36,40,44    mem=1
> cpus=1,5,9,13,17,21    mem=2
> cpus=25,29,33,37,31,45    mem=3
> cpus=2,6,10,14,18,22    mem=4
> cpus=26,30,34,38,42,46    mem=5
> cpus=3,7,11,15,19,23    mem=6
> cpus=27,31,35,39,43,47    mem=7
> 
> it's a bit strange, but this is how it's reported by linux.
> When i start a job with these parameters:
> 
> #PBS -N JobMPI
> #PBS -l nodes=1:ppn=4
> #PBS -m abe
> 
> It starts 4 processes in a really weird way. Sometimes he uses core
> 0,1,2,3, sometimes 2 processes get run on one core, then it jumps to
> core 24, etc.
> the system takes a big performance hit when the processes aren't run
> on
> the cores sharing the same memory, so we want to lock the tasks on
> the
> same node.
> 
> What am i doing wrong?

I second Chris's suggestion - please send in the output of lstopo and we'll see what to do from there. I do wonder about your ordering - I'm not sure that TORQUE 3.0.* is well-equipped to handle a system with that kind of layout, but send in your lstopo output and we'll help you as much as we can. 

-- 
David Beer 
Direct Line: 801-717-3386 | Fax: 801-717-3738
     Adaptive Computing
     1656 S. East Bay Blvd. Suite #300
     Provo, UT 84606



More information about the torqueusers mailing list