[torqueusers] Torque cpusets messing up

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Sun Mar 20 20:27:21 MDT 2011



> -----Original Message-----
> From: R. David [mailto:david at unistra.fr]
> Sent: Friday, 18 March 2011 3:50 AM
> To: Torque Users Mailing List
> Subject: [torqueusers] Torque cpusets messing up
> 
> Hello,
> 
> We had a long mail discussion a few weeks ago about MPI processes not
> correctly using Torque Cpusets.
> 
> I still have the problem here.
> 
> Here is what I could observe today :
> 
> - Torque 2.5.4, Centos 5.3
> - 8 cores node, 1 core busy with a very long job (gaussian, running for
> 193 hours). This job has its own CPUset, of course, containing one core
> (core # 3)
> - I submit a job on the 7 available cores (qsub -l
> nodes=nodename:ppn=7). I get a 7-core cpuset : 0-2,4-7
> 
> - I start the MPI job. 5 of the 7 MPI processes each get a core, going
> up to 100% CPU.
>    - The 2 others seem to share a core, they don't go higher than 50%
> CPU.
> 
> - I suspend (qsig -s suspend) the long single-core job, the MPI
> processes spread over 7 cores, each of the 7 processes get 100% of CPU
> - Resuming the long single-core job (qsig -s resume), it lands on the
> final available core, and rises again to 100% of CPU.
> - Stopping / starting again the 7 mpi processes => each of them get
> 100% of CPU.
> 
> I don't understand what I had to suspend and resume the single-core job
> to have, finally, each of the 8 processes running on this node
> retrieving 100% of CPU time.
> 
> Do you have any clue on this ?
> 
> 	Regards,
> 	R. David

Hi R. David,

I would expect the processes to be bound by the cpuset and constrained to run on the allocated cpus. Does the cpuset change with the suspend/resume sequence? - I suspect the cpuset is unmodified. (Aside: scheduling with suspend/resume would probably be troublesome with cpusets configured on).  Perhaps the mpi tries to place the processes on particular cores including ineligible ones and the misplaced processes get doubled up with others.  The affinity may be such that the processes can move to other cores (within the cpuset) but this may not happen for a conservatively long time (that makes sense). The situation getting better might have even been triggered by the suspend/resume or it might have happened anyway in a 'random' amount of time.

You may find it useful to show the last occupied core in 'top'. If you type 'f' you get to choose the fields that will be displayed, Y or ] will show which cpu a process was last run on - so you can monitor whether it moves about - or what cpu it is bound to.

- Gareth


More information about the torqueusers mailing list