[torqueusers] random node selection using maui?
garrick at clusterresources.com
Fri Feb 9 00:24:57 MST 2007
On Thu, Feb 08, 2007 at 06:07:06PM -0800, Peter Wyckoff alleged:
> Ok. One quick question. Let's say I have a 10,000 node cluster, so 250
> racks and I want 1,000 machines, 4 from each rack (preferably). Will
> torque and maui perform ok with this big a scheduling constraint?
> And if I tested with 25 machines but with 250 artificial properties to
> constrain against, would this be a somewhat representative test at scale?
> thanks, pete
>From my own experience, I can only speak up to the ~2000 node range and
TORQUE and Maui are fine. Lots of jobs can make Maui cranky, but it is
fine with lots of nodes.
I don't think Maui scales up to 10k nodes by default. There is probably
a define that has to be bumped up. Of course, I'm sure Moab does this
stuff without breaking a sweat.
> Garrick Staples wrote:
> >On Thu, Feb 08, 2007 at 02:29:13PM -0800, Peter Wyckoff alleged:
> >>Hi Lennart,
> >>the randomization goal is to have the computation on as many different
> >>physical racks as possible. This is for data locality (i.e., latency)
> >>and IO bandwidth scaling for a distributed file system.
> >>The other way we are looking at is, just like you proposed, having a
> >>node property called rackid-XXXX where XXXX = the subnet and then when
> >>doing qsubs, requesting N/#racks of each rack type as a scheduling
> >>I was thinking that if there were a way to be completely random (pseudo
> >>random :)), we could get the same effect.
> >The 'ordered' scheduling in maui is largely based on the order in
> >pbs_server's nodes file. You could just presort them there and then let
> >maui work as normal.
> >So the nodes file would basicly be something like this (assuming 3 racks
> >with 40 nodes per rack):
> >node001 rackid1
> >node041 rackid2
> >node081 rackid3
> >node002 rackid1
> >node042 rackid2
> >node082 rackid3
> >torqueusers mailing list
> >torqueusers at supercluster.org
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers