[torquedev] cpuset support
Chris Samuel
csamuel at vpac.org
Fri Nov 16 00:22:12 MST 2007
On Tue, 13 Nov 2007, Garrick Staples wrote:
> Here's what we came up with...
Just to capture some of the other aspects of this that myself, Garrick
and Craig have been discussing about at SC'07. None of this is set
in stone and discussion is encouraged!
0) If we create the cpusets for a job prior to the prologue running
then cluster admins are able to use this to modify the cpus
allocated. This is _probably_ a bad thing as it seems that the
right place to change this decision is probably in the scheduler.
However, we should probably work out what to do about this - do we
check the cpusets after the prologue to record any changes and report
those back so that the scheduler can update its view of the world or
do we instead make the cpusets *after* the job has started to prevent
this happening in the first place ?
1) If we are using cpusets then the concept of load average to work
out how busy a node is goes out the window.
This is because if a user submits a 1 cpu job which then fires off 20
processes by accident all those processes will be confined to 1 core,
so the scheduler could use the other cores in the knowledge that they
are unaffected by the rogue CPU usage.
This could make life easier for the scheduler if it knows that cpusets
are enabled on this node.
2) The pbs_mom can easily tell at startup whether cpusets are enabled
by checking for the presence of the cpuset pseudo-filesystem
in /proc/filesystems - we won't try and second guess the admin and
load the kernel module if it's missing either.. :-)
This fact could be exported dynamically as a node property.
3) We need some way for the pbs_mom to advertise the organisation of
which core is in which socket, and possibly higher levels of NUMA
organisational awareness for systems such as the Altix, so that the
scheduler can make decisions based upon this.
4) To do this we need to be able to work out from what is in /proc how
the system is arranged (and be able to handle the various layouts on
different architectures and kernel versions).
On recent kernels this info is fairly easy to get (at least for a
standard Intel or AMD system) but you *cannot* assume that all
sockets (aka "physical id") and cores (i.e. processor numbers) are
sequential!
We will need to collect various /proc/cpuinfo outputs along with
details of the system and the output of the 'arch' and 'uname -a'
commands to help us with this.
I'll be posting some in a bit from the systems I have access to..
cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20071116/31716aa9/attachment.bin
More information about the torquedev
mailing list