[torqueusers] numa problems
Wannes Van Causbroeck
wannes.van.causbroeck at imdc.be
Thu Sep 29 06:39:11 MDT 2011
I sent this message before, but i don't know if it arrived correctly, so i'll try again. (sorry if this is a dupe)
we're just starting out with torque, but we've run into a problem. We
have a 48-core AMD system (4 sockets with 12 cores each). The linux
system sees this as 8 nodes with 6 cores each.
I've tried compiling torque 3.02 with --enable-cpuset and
--enable-numa-support. (i also tried without cpuset, but the result was
the same, i even got an error telling me i had to mount /dev/cpuset,
even without this switch???).
Anyway, our mom.layout looks like this:
it's a bit strange, but this is how it's reported by linux.
When i start a job with these parameters:
#PBS -N JobMPI
#PBS -l nodes=1:ppn=4
#PBS -m abe
It starts 4 processes in a really weird way. Sometimes he uses core
0,1,2,3, sometimes 2 processes get run on one core, then it jumps to
core 24, etc.
the system takes a big performance hit when the processes aren't run on
the cores sharing the same memory, so we want to lock the tasks on the
What am i doing wrong?
More information about the torqueusers