[Mauiusers] Patch for nodeaccespolicy SINGLEJOB and MAXPS for SMP machines

Bas van der Vlies basv at sara.nl
Tue Feb 15 07:57:03 MST 2005


Garrick Staples wrote:
> On Mon, Feb 14, 2005 at 05:45:06PM +0100, Bas van der Vlies alleged:
> 
>>At our side we run one job per nodes and have an MAXPS setting of 600 
>>hours and max walltime 120 hours. Our nodes have 2 processors. When the 
>>user submits
>>an job for eg:
>>  1) qsub -I -lnodes=60:ppn=1 -lwalltime=10:00:00 ( will run )
>>  2) qsub -I -lnodes=60:ppn=2 -lwalltime=10:00:00 ( wil not run MAXPS
>>                                                    violation)
>>
>>Now when job 1 runs is allocates the whole node and maui sees that it 
>>oocupies 4 task ( 2 nodes and each node two cpu's = 4 tasks). So the 
>>used tme will becalculated as 60 * 2 * 10 = 1200 hours. What is far more 
>>then allowed!
>>
>>The next example will only run one job instead of 2:
>>  qsub -I -lnodes=30:ppn=1 -lwalltime=10:00:00 ( will run )
>>  qsub -I -lnodes=30:ppn=1 -lwalltime=10:00:00 ( will not  run MAXPS
>>                                                 violation )
>>
>>I have an patch that checks if NODEACCESSPOLICY SINGLEJOB is set. If so 
>>then it forgets the the cpu's per node.
> 
> 
> I understand what you are doing (and the patch looks fine to me), and I could
> even see myself using it, but I'm not sure this is the right thing to do.  If
> nothing else, could this behaviour be a configuration option?
> 
We have to discuss what kind of option it would be. It only does an 
other recalculation if NODEACCESSPOLICY SINGLEJOB is set. For ther other 
policies it does exactly the same as before (is tested). To my knowledge 
it is the right thing else users with only use an single proc per node 
has an advantage. We see now a lot of jobs that request one cpu for 60 
nodes and used both ;-). For nodes that have more cpu's and SINGLEJOB is 
set the situation will be worse.

> What we really need is a policy on "node seconds".  It's what you are actually
> trying to control.  It would be simple in the SINGLEJOB world, and might only
> be valid there.  But I can also imagine assigning fractional seconds to jobs on
> a shared node too, but that would be complicated.
> 
For shared nodes mulitiple jobs can run on an node. That is why we do 
not have to do the recalculation.

> I've always worked around this with routing queues in pbs.  First route
> to a queue with small nodes and walltime, if that fails, route to a queue with
> medium nodes and walltime, etc.  If the job doesn't fit through any of the
> queues, then you reject the job.
> 
We only want one queue with an max walltime = 120 hours and the users 
may submit every combination as long they do not execeed the 600 hours 
MAXPS limit. It keeps the config easy and clean.


		Regards

--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************


More information about the mauiusers mailing list