[torqueusers] cpusets

R. David david at unistra.fr
Wed Nov 30 12:31:35 MST 2011


Hello,

In the openmpi configuration file openmpi-mca-params.conf, just add :
mpi_paffinity_alone = 1




Le 30 nov. 2011 à 20:14, Martin Siegert a écrit :

> Hi,
> 
> we just recently started using cpusets and I do not have much experience
> with them. However, by now I noticed several times that MPI jobs
> (openmpi with TM) slow down dramatically: apparently two processes
> are using the same core (i.e., both only get 50% cpu usage) even though
> the number of cores in the cpuset equals the number of processes
> of the mpi job on the particular node.
> 
> E.g.,
> 
> top - 11:05:24 up 42 days, 22:43,  2 users,  load average: 6.99, 6.93, 6.68
> Tasks: 468 total,   8 running, 460 sleeping,   0 stopped,   0 zombie
> Cpu(s): 24.9%us,  0.2%sy,  0.0%ni, 74.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  24675188k total, 12099684k used, 12575504k free,    69968k buffers
> Swap: 16777208k total,    29932k used, 16747276k free,  9946292k cached
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 3717 user1     25   0  183m  91m  14m R 100.0  0.4  15:43.62 Clark
> 4526 user2     25   0  109m  36m 3088 R 100.0  0.2   2:02.43 mdrun
> 15863 user3     25   0  459m 163m  15m R 100.0  0.7 711:26.30 wrfm_arw.exe
> 15864 user3     25   0  452m 156m  15m R 100.0  0.6 688:28.80 wrfm_arw.exe
> 4562 user2     25   0  109m  36m 3088 R 99.7  0.2   0:23.02 mdrun
> 15861 user3     25   0  462m 165m  15m R 50.2  0.7 510:02.12 wrfm_arw.exe
> 15862 user3     25   0  465m 169m  15m R 49.9  0.7 446:21.37 wrfm_arw.exe
> 
> root at b311:~> cat /proc/15861/cpuset 
> /torque/4913985.b0
> root at b311:~> cat /proc/15862/cpuset 
> /torque/4913985.b0
> 
> (same for 15863, 15864) and
> 
> root at b311:~> ls /dev/cpuset//torque/4913985.b0
> 68  cpu_exclusive   memory_pressure     notify_on_release
> 69  cpus            memory_spread_page  sched_relax_domain_level
> 70  mem_exclusive   memory_spread_slab  tasks
> 71  memory_migrate  mems
> root at b311:~> cat /dev/cpuset/torque/4913985.b0/cpus
> 0-1,4,8
> 
> Do processes within a cpuset get bound to a particular cpu?
> If yes, how do I find out which one?
> 
> Anyway, if you have na idea what could be causing this and how to
> solve this problem, please let me know.
> 
> Thanks!
> 
> Cheers,
> Martin
> 
> -- 
> Martin Siegert
> Simon Fraser University
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

---------------------------------------------------------
  R. David - david at unistra.fr
  Responsable du meso-centre 
  UdS / Direction Informatique
  Tel. : 03 68 85 45 48 
---------------------------------------------------------





More information about the torqueusers mailing list