[torqueusers] numa problems
David Beer
dbeer at adaptivecomputing.com
Fri Sep 30 09:14:45 MDT 2011
----- Original Message -----
> Hello everyone!
> I sent this message before, but i don't know if it arrived correctly,
> so i'll try again. (sorry if this is a dupe)
>
>
> we're just starting out with torque, but we've run into a problem. We
> have a 48-core AMD system (4 sockets with 12 cores each). The linux
> system sees this as 8 nodes with 6 cores each.
> I've tried compiling torque 3.02 with --enable-cpuset and
> --enable-numa-support. (i also tried without cpuset, but the result
> was
> the same, i even got an error telling me i had to mount /dev/cpuset,
> even without this switch???).
Numa support uses cpusets for its implementation, so yes, you'll get the same result whether or not you use the --enable-cpuset switch. You will definitely need to mount cpusets in order to get things working.
> Anyway, our mom.layout looks like this:
>
> cpus=0,4,8,12,16,20 mem=0
> cpus=24,28,32,36,40,44 mem=1
> cpus=1,5,9,13,17,21 mem=2
> cpus=25,29,33,37,31,45 mem=3
> cpus=2,6,10,14,18,22 mem=4
> cpus=26,30,34,38,42,46 mem=5
> cpus=3,7,11,15,19,23 mem=6
> cpus=27,31,35,39,43,47 mem=7
>
> it's a bit strange, but this is how it's reported by linux.
> When i start a job with these parameters:
>
> #PBS -N JobMPI
> #PBS -l nodes=1:ppn=4
> #PBS -m abe
>
> It starts 4 processes in a really weird way. Sometimes he uses core
> 0,1,2,3, sometimes 2 processes get run on one core, then it jumps to
> core 24, etc.
> the system takes a big performance hit when the processes aren't run
> on
> the cores sharing the same memory, so we want to lock the tasks on
> the
> same node.
>
> What am i doing wrong?
I second Chris's suggestion - please send in the output of lstopo and we'll see what to do from there. I do wonder about your ordering - I'm not sure that TORQUE 3.0.* is well-equipped to handle a system with that kind of layout, but send in your lstopo output and we'll help you as much as we can.
--
David Beer
Direct Line: 801-717-3386 | Fax: 801-717-3738
Adaptive Computing
1656 S. East Bay Blvd. Suite #300
Provo, UT 84606
More information about the torqueusers
mailing list