[torquedev] cpuset support

Garrick Staples garrick at usc.edu
Fri Jan 11 19:18:23 MST 2008


On Mon, Nov 12, 2007 at 04:17:57PM -0800, Garrick Staples alleged:
> I just bumped into Chris Samuel at his (rather barren) booth here at SC07 and I think we just designed cpuset support.
> 
> Here's what we came up with...

The first version is checked in!  Almost 2 months to the day :)

I'll do the wiki docs this weekend or Monday.

 
> On startup, pbs_mom will create /dev/cpuset/torque (with all cpus) if it
> doesn't already exist and move itself to it.  This allows the admin to stuff
> pbs_mom inside a smaller cpuset if desired by creating it in the initscript.
> We will call this the "torqueset".

Done, except that pbs_mom doesn't move itself into the torqueset.

 
> When a job starts, pbs_mom will create a per-job cpuset under the torqueset
> with the correct cpus called the "jobset".  It will do this after prologue,
> which allows the admin to pre-create it if desired.  This happens on all nodes.

Done. but it happens before prologue, letting it be modified if desired.

 
> Also, per-vnode cpusets will also be created under the jobset at job start.

Done.

 
> pbs_mom will run the batch script inside of the jobset and all TM spawn
> requests will run in the vnodeset.

Done.


> You end up with cpusets that look like:
>   /dev/cpuset/torque/job-123.pbsserver.foo.edu/vnode-4

Done, but they are slightly less self-descriptive:
  /dev/cpuset/torque/123.pbsserver.foo.edu/4
 
Testers can inspect the cpus, mems, and tasks files in the various cpuset
directories.


> Job exit will consist of ensuring the cpusets are empty (killing processes)
> before removing them.

Done, though it's not particularly smart or reliable about it.

 
> Exclusive cpusets can't be used because of suspended jobs.
> 
> All mems will be added to all cpusets unless someone comes up with another idea.
> 
> This seems pretty simple to implement, doesn't require any build deps, and
> makes sense to me.  Any thoughts?

All of the cpuset code is in src/resmom/linux/cpuset/cpuset.c.  It has a bunch
of FIXME notes.  Run configure with --enable-cpuset.  All code outside of
cpuset.c must be wrapped in 'PENABLE_LINUX26_CPUSETS'.

The code is ugly, but works.  It needs cleaning.

Now we need the "smarts".  pbs_mom needs to discover the topology, export it to
pbs_server somehow, and then pbs_server/moab gets to schedule individual cpus.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20080111/12461b18/attachment.bin


More information about the torquedev mailing list