[torqueusers] Torque cpusets messing up

Shenglong Wang sw77 at nyu.edu
Thu Mar 17 11:02:25 MDT 2011


What kind of MPI compilers have you used to compile the MPI code?  Have you enabled CPU map for mpiexec?

To my experience with mvapich and mpiexec, there are 8 CPU cores per node, if I submit 2 jobs with 4 CPU cores per job on the same node, I have to include the flag

env VIADEV_ENABLE_AFFINITY=0 

to mpiexec, otherwise all the 2 jobs will be run on the first 4 CPU cores.

Best,

Shenglong


On Mar 17, 2011, at 12:49 PM, R. David wrote:

> Hello,
> 
> We had a long mail discussion a few weeks ago about MPI processes not correctly using Torque Cpusets.
> 
> I still have the problem here. 
> 
> Here is what I could observe today :
> 
> - Torque 2.5.4, Centos 5.3
> - 8 cores node, 1 core busy with a very long job (gaussian, running for 193 hours). This job has its own CPUset, of course, containing one core (core # 3)
> - I submit a job on the 7 available cores (qsub -l nodes=nodename:ppn=7). I get a 7-core cpuset : 0-2,4-7
> 
> - I start the MPI job. 5 of the 7 MPI processes each get a core, going up to 100% CPU.
>   - The 2 others seem to share a core, they don't go higher than 50% CPU.
> 
> - I suspend (qsig -s suspend) the long single-core job, the MPI processes spread over 7 cores, each of the 7 processes get 100% of CPU
> - Resuming the long single-core job (qsig -s resume), it lands on the final available core, and rises again to 100% of CPU.
> - Stopping / starting again the 7 mpi processes => each of them get 100% of CPU.
> 
> I don't understand what I had to suspend and resume the single-core job to have, finally, each of the 8 processes running on this node retrieving 100% of CPU time.
> 
> Do you have any clue on this ?
> 
> 	Regards,
> 	R. David
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list