[torqueusers] Help with NUMA support

Tom Rosmond rosmond at reachone.com
Mon Nov 28 12:25:28 MST 2011


A colleague and I are trying to reconfigure a Linux system with TORQUE
NUMA support.  Here are some details of the system

1. 48 processor : 'lstopo' output gives 8 NUMA nodes, 6 cores/node.

2. Debian linux running 2.6.32-5-amd64 kernel

3. Open_mpi 1.5.3, configured with 'libnuma' support.

We previously had TORQUE successfully configured and running  without
NUMA support, but this wasn't satisfactory for running multiple MPI jobs
concurrently.  Here are the steps we have taken:

1. Reconfigured TORQUE with --enable-num-support

2. Created 'mom.layout' in /var/spool/torque/mom_priv with:

cpus=0-5     mem=0
cpus=6-11    mem=1
cpus=12-17   mem=2
cpus=18-23   mem=3
cpus=24-29   mem=4
cpus=30-35   mem=5
cpus=36-41   mem=6
cpus=42-47   mem=7

based on the 'lstopo' output.

3. created 'nodes' file in /var/spool/torque/server_priv with:

notus  np=48 num_numa_nodes=8

where 'notus' is the host name.


4. restarted 'pbs_mom', 'pbs_sched', and 'pbs_server'.


5. submitted MPI jobs with, e.g. '-l nodes=4:ppn=6' for PBS resources
and 'mpirun -np 24' for MPI.


With this we are getting the following error messages in the
'sched_logs' file:

11/28/2011 12:10:18;0040; pbs_sched;Job;10.notus.nrl.navy.mil;Not enough
of the right type of nodes available
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-0;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-1;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-2;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-3;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-4;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-5;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-6;Can not open connection
to mom
11/28/2011 12:20:18;0002; pbs_sched;Req;notus-7;Can not open connection
to mom


What are we missing?  Any suggestions or advice?

T. Rosmond







More information about the torqueusers mailing list