[torqueusers] Help with NUMA support
Tom Rosmond
rosmond at reachone.com
Mon Nov 28 12:25:28 MST 2011
A colleague and I are trying to reconfigure a Linux system with TORQUE
NUMA support. Here are some details of the system
1. 48 processor : 'lstopo' output gives 8 NUMA nodes, 6 cores/node.
2. Debian linux running 2.6.32-5-amd64 kernel
3. Open_mpi 1.5.3, configured with 'libnuma' support.
We previously had TORQUE successfully configured and running without
NUMA support, but this wasn't satisfactory for running multiple MPI jobs
concurrently. Here are the steps we have taken:
1. Reconfigured TORQUE with --enable-num-support
2. Created 'mom.layout' in /var/spool/torque/mom_priv with:
cpus=0-5 mem=0
cpus=6-11 mem=1
cpus=12-17 mem=2
cpus=18-23 mem=3
cpus=24-29 mem=4
cpus=30-35 mem=5
cpus=36-41 mem=6
cpus=42-47 mem=7
based on the 'lstopo' output.
3. created 'nodes' file in /var/spool/torque/server_priv with:
notus np=48 num_numa_nodes=8
where 'notus' is the host name.
4. restarted 'pbs_mom', 'pbs_sched', and 'pbs_server'.
5. submitted MPI jobs with, e.g. '-l nodes=4:ppn=6' for PBS resources
and 'mpirun -np 24' for MPI.
With this we are getting the following error messages in the
'sched_logs' file:
11/28/2011 12:10:18;0040; pbs_sched;Job;10.notus.nrl.navy.mil;Not enough
of the right type of nodes available
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-0;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-1;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-2;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-3;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-4;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-5;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-6;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-7;Can not open connection
to mom
What are we missing? Any suggestions or advice?
T. Rosmond
More information about the torqueusers
mailing list