[torqueusers] torque 3.0.3 on uv

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Sat Jan 7 05:47:54 MST 2012


> -----Original Message-----
> From: Troy Baer [mailto:tbaer at utk.edu]
> Sent: Thursday, 5 January 2012 5:35 AM
> To: David Beer; Torque Users Mailing List
> Subject: Re: [torqueusers] torque 3.0.3 on uv
> 
> On Wed, 2012-01-04 at 09:53 -0700, David Beer wrote:
> > ----- Original Message -----
> > > I’ve started configuring torque 3.0.3 on an SGI UV system
> (following
> > > http://www.clusterresources.com/torquedocs/1.7torqueonnuma.shtml )
> > > and am having problems.
> > >
> > > I started with a working non-numa 3.0.3 setup as a sanity check.
> > >
> > > I configured it with –enable-numa-support and made a nodes file:
> > >
> > > cherax-1 np=48 num_numa_nodes=6
> > >
> > > and mom.layout
> > >
> > > #cpus=0-15 mem=0-1 /boot
> > > cpus=16-23 mem=2
> > > cpus=24-31 mem=3
> > > #cpus=32-47 mem=4-5 /user
> > > cpus=48-55 mem=6
> > > cpus=55-63 mem=7
> > > cpus=64-71 mem=8
> > > cpus=72-79 mem=9
> > >
> > > (note that some of the blades are set aside for io etc. and not all
> > > are currently on or configured).
> >
> > For me this is the first red flag. I don't know that we have anyone
> > successfully using non-sequential layouts (skipping a blade in the
> > middle). I know we have other sites, in fact it is typical, that skip
> > some at the beginning or end for the boot set, but I don't think
> > anyone is skipping in the middle. Would it be possible to move that
> > user either to the front or to the back?
> 
> The way I've handled this is to leave the all NUMA nodes in the
> mom.layout file, but then fence off the one I don't want used by jobs
> by
> placing standing reservations on them in Moab and/or marking them
> offline in TORQUE.
> 
> 	--Troy

Thanks Troy,

I've taken your advice for now and am working through the problem with support from David Beer. It looks like I'll be cranking up the debugger on Monday!

Gareth

> --
> Troy Baer, HPC System Administrator
> National Institute for Computational Sciences, University of Tennessee
> http://www.nics.tennessee.edu/
> Phone:  865-241-4233
> 
> 



More information about the torqueusers mailing list