[torqueusers] Altix cpusets

Dave Jackson jacksond at clusterresources.com
Thu Oct 27 16:32:09 MDT 2005


Jeroen,

  A new snapshot has been released.  Please let us know how it works.

Dave

On Fri, 2005-10-28 at 08:16 +1000, Jeroen van den Muyzenberg wrote:
> Hi Dave,
> 
> Yes, confirm the missing 'S'. I must have inadvertently deleted it
> somehow making up the patch.
> 
> Have had some good discussions here yesterday on how best to assign
> cpus/nodes/bricks to jobs via cpusets and will start
> experimenting/coding them up. We're primarily interested because we
> don't have a uniform amount of memory/brick and would like to optimise
> the placement of jobs to cpus based on memory requirements as well as
> locating multi-cpu jobs as close together as possible to local memory.
> 
> Thanks,
> Jeroen
> 
> On Thu, 27 Oct 2005, Dave Jackson wrote:
> 
> > Jeroen,
> >
> >  Thanks for the patch.  The changes have been made but before they are
> > checked in, can you verify that the following is what was intended:
> >
> > #ifdef CPUSETS_FIRST_CPU
> >    for (i = CPUSETS_FIRST_CPU;i < nCPUS;i++) /* CPUSET not CPUSETS */
> > #else
> >    for (i = 0;i < nCPUS;i++)
> > #endif
> >
> >  Note the comment.
> >
> >  If you can confirm, we will check in the changes and roll out a new
> > snapshot.
> >
> > Thanks,
> > Dave
> >
> > On Thu, 2005-10-27 at 11:07 +1000, Jeroen van den Muyzenberg wrote:
> >> Hi,
> >>
> >> I've had the chance to play on an Altix 3700 before it joins our
> >> existing Altix in production next week and have been experimenting with
> >> using cpusets, with little initial success.
> >>
> >> Turns out there were two problems. A cpuset name can be a max of 8
> >> characters, and the string (cQueueName in start_exec.c) holding this
> >> name didn't have the space for the terminating null. Also cQueueName was
> >> initialising with garbage, and the strncpy and strncat used to create
> >> the cpuset name don't append a null terminator if not found in the
> >> source string.
> >>
> >> We also intend to start using bootcpusets, and the existing code doesn't
> >> account for that. ie it will start placing jobs from CPU 0 onwards
> >> regardless that this CPU is already in another cpuset.
> >>
> >> Attached is a patch that addresses all these issues. For bootcpuset support,
> >> there needs to be a define in pbs_config.h
> >>
> >> #define CPUSETS_FIRST_CPU X
> >>
> >> where X is the first CPU outside the defined bootcpuset.
> >>
> >> Looking forward to seeing this work in production next week.
> >>
> >> Further improvements would be the ability to specify the type of memory
> >> access regime for the cpuset, and a better cpu allocation algorithm that
> >> would try to pack multi-cpu jobs onto the same node/brick if at all
> >> possible.
> >>
> >> Cheers,
> >> Jeroen
> >>
> >> Jeroen van den Muyzenberg
> >> CSIRO High Performance Scientific Computing
> >> Bureau of Meteorology/CSIRO HPCCC -
> >> High Performance Computing and Communications Centre
> >> Ph: +61 3 9669 8111 Fax: +61 3 9669 8112
> >> Jeroen.vandenMuyzenberg at csiro.au
> >> _______________________________________________ torqueusers mailing list torqueusers at supercluster.org http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> 
> Jeroen van den Muyzenberg
> CSIRO High Performance Scientific Computing
> Bureau of Meteorology/CSIRO HPCCC -
> High Performance Computing and Communications Centre
> Ph: +61 3 9669 8111 Fax: +61 3 9669 8112
> Jeroen.vandenMuyzenberg at csiro.au



More information about the torqueusers mailing list