[torqueusers] Hogging Nodes
smm at rincon.com
Fri Mar 20 08:55:14 MDT 2009
There are a lot of great suggestions, but none seem to fit our situation exactly. Yes, we only want to affect the usage in question (a particular program or two). In the general case we want all processors on all nodes to be available.
The ppn=2 solution might be the best approach if we could write a little bit of software to make half the jobs do nothing.
Thank you to everyone who replied.
From: Gareth.Williams at csiro.au [mailto:Gareth.Williams at csiro.au]
Sent: Thursday, March 19, 2009 5:07 PM
To: Sarah Mulholland
Cc: torqueusers at supercluster.org
Subject: RE: [torqueusers] Hogging Nodes
> From: Prakash Velayutham [mailto:prakash.velayutham at cchmc.org]
> Sent: Friday, 20 March 2009 1:45 AM
> To: Sarah Mulholland
> Cc: torqueusers at supercluster.org
> Subject: Re: [torqueusers] Hogging Nodes
> May be
> NODEACCESSPOLICY SINGLEJOB
> will do?
> On Mar 18, 2009, at 7:47 PM, Sarah Mulholland wrote:
> I sent this question to the maui group over a week ago, but there was no
> answer. Perhaps this question is more appropriate to the torque group.
> I am running the maui scheduler 3.2.6 patch level 16 with torque 2.1.6. I
> am looking for a way to submit a job on some number of nodes, say 10. In
> addition to running on 10 nodes, I want exclusive use of those nodes. Is
> there a property I can set to allow that kind of scheduling? So far I'm
> submitting jobs with
> "-l nodes=10:ppn=1"
> Specifically I have a job that needs to run on 10 processors all on
> different nodes. We have two processors per node. The JOBNODEMATCHPOLICY
> EXACTNODE makes sure our job gets what it needs, but I want to prevent
> anything else from running on the second processor of the 10 nodes.
> Is there a "don't-share-the-nodes" modifier that I can set on the job
> submission? In the maui.cfg?
You seem to have the solution now, but this will globally affect all jobs and this may not be desirable if you have a mixed workload. You could use the softer approach of submitting the jobs in question with
"-l nodes=10:ppn=2" and then just only running 10 processes. You may need to customize your options used with mpi in this case, but it would only affect the usage in question and not all users.
More information about the torqueusers