[torqueusers] Altix cpusets

Jeroen van den Muyzenberg Jeroen.vandenMuyzenberg at csiro.au
Thu Oct 27 16:16:32 MDT 2005


Hi Dave,

Yes, confirm the missing 'S'. I must have inadvertently deleted it
somehow making up the patch.

Have had some good discussions here yesterday on how best to assign
cpus/nodes/bricks to jobs via cpusets and will start
experimenting/coding them up. We're primarily interested because we
don't have a uniform amount of memory/brick and would like to optimise
the placement of jobs to cpus based on memory requirements as well as
locating multi-cpu jobs as close together as possible to local memory.

Thanks,
Jeroen

On Thu, 27 Oct 2005, Dave Jackson wrote:

> Jeroen,
>
>  Thanks for the patch.  The changes have been made but before they are
> checked in, can you verify that the following is what was intended:
>
> #ifdef CPUSETS_FIRST_CPU
>    for (i = CPUSETS_FIRST_CPU;i < nCPUS;i++) /* CPUSET not CPUSETS */
> #else
>    for (i = 0;i < nCPUS;i++)
> #endif
>
>  Note the comment.
>
>  If you can confirm, we will check in the changes and roll out a new
> snapshot.
>
> Thanks,
> Dave
>
> On Thu, 2005-10-27 at 11:07 +1000, Jeroen van den Muyzenberg wrote:
>> Hi,
>>
>> I've had the chance to play on an Altix 3700 before it joins our
>> existing Altix in production next week and have been experimenting with
>> using cpusets, with little initial success.
>>
>> Turns out there were two problems. A cpuset name can be a max of 8
>> characters, and the string (cQueueName in start_exec.c) holding this
>> name didn't have the space for the terminating null. Also cQueueName was
>> initialising with garbage, and the strncpy and strncat used to create
>> the cpuset name don't append a null terminator if not found in the
>> source string.
>>
>> We also intend to start using bootcpusets, and the existing code doesn't
>> account for that. ie it will start placing jobs from CPU 0 onwards
>> regardless that this CPU is already in another cpuset.
>>
>> Attached is a patch that addresses all these issues. For bootcpuset support,
>> there needs to be a define in pbs_config.h
>>
>> #define CPUSETS_FIRST_CPU X
>>
>> where X is the first CPU outside the defined bootcpuset.
>>
>> Looking forward to seeing this work in production next week.
>>
>> Further improvements would be the ability to specify the type of memory
>> access regime for the cpuset, and a better cpu allocation algorithm that
>> would try to pack multi-cpu jobs onto the same node/brick if at all
>> possible.
>>
>> Cheers,
>> Jeroen
>>
>> Jeroen van den Muyzenberg
>> CSIRO High Performance Scientific Computing
>> Bureau of Meteorology/CSIRO HPCCC -
>> High Performance Computing and Communications Centre
>> Ph: +61 3 9669 8111 Fax: +61 3 9669 8112
>> Jeroen.vandenMuyzenberg at csiro.au
>> _______________________________________________ torqueusers mailing list torqueusers at supercluster.org http://www.supercluster.org/mailman/listinfo/torqueusers
>

Jeroen van den Muyzenberg
CSIRO High Performance Scientific Computing
Bureau of Meteorology/CSIRO HPCCC -
High Performance Computing and Communications Centre
Ph: +61 3 9669 8111 Fax: +61 3 9669 8112
Jeroen.vandenMuyzenberg at csiro.au


More information about the torqueusers mailing list