[Mauiusers] nodeavailabilitypolicy

Abhishek Gupta abhig at princeton.edu
Thu Dec 16 09:32:43 MST 2010


Hi Renato,
Its not increasing memory, but if I say I need mem=6gb or pmem=6gb, it 
still goes to the node with total memory less than 6gb. So I thought by 
setting the NODEAVAILABILITYPOLICY, I will be able to define 
availability on the bases of memory.
Like we define np= in nodes file, do we have to define memory resources 
too?
Thanks,
Abhi.


Renato Borges wrote:
> Hi Abhi!
>
> On Wed, Dec 15, 2010 at 7:21 PM, Abhishek Gupta <abhig at princeton.edu 
> <mailto:abhig at princeton.edu>> wrote:
>
>     Hi,
>
>     I am trying to figure out the way so that memory usage does not exceed
>     the available memory on a node. I was thinking that this parameter (
>     NODEAVAILABILITYPOLICY COMBINED:MEM ) should check the availability of
>     node on the bases of memory available, but it does not.
>     Is there anything else I need to add to make it work?
>     NODEAVAILABILITYPOLICY COMBINED:MEM
>
>     Thanks,
>     Abhi.
>
>
> I´ve never used NODEAVAILABILITYPOLICY, but I have a similar problem, 
> which is: the jobs we run at my site start out with a small memory 
> footprint, and end with large amounts of data in memory (in 
> virtualization lingo, they "balloon"). Maybe this is also your case, 
> and this is why setting this variable doesn`t work?
>
> To avoid swapping, I have set a MAXJOBPERUSER variable for each 
> compute node, because all of our jobs that have an increasing memory 
> footprint come from a single user (actually, a grid account).
>
> Tweaking the MAXJOBPERUSER variable, I have found a value for each 
> node (we have an heterogeneous cluster) that runs the jobs without 
> swapping.
>
> However, this is not ideal because this setting is applied to all jobs 
> that run on a given node, and some local users have jobs that are 
> small in memory, but large in number of cores, and the limits which I 
> set for the grid jobs are too restrictive for them. Whereas a grid job 
> can only run 4 jobs on a 8 core, 8GB RAM node, local user´s jobs could 
> merrily run on all 8 cores simultaneously.
>
> Trying to find a better solution, I found that one can set on torque 
> (supposing you use torque):
>
> qmgr -c "set queue XXX resources_min.mem=2000kb"
>
> And this would (theoretically) only attribute nodes that have at least 
> 2GB of free memory to waiting jobs on XXX queue. I say "theoretically" 
> because I have not had luck with this setting. As I said, our grid 
> jobs balloon, and so our nodes get one job per slot, since initially 
> (for the first few hours) the jobs are only downloading data, and so 
> there is always 2GB free. But when the memories ballon, we start 
> swapping heavily.
>
> I guess that you might have more luck with that if your jobs´ memory 
> footprint is more constant, or if some guru could teach us how to 
> "reserve" some memory amount per job, I know that would suit me perfectly.
>
> Cheers,
> Renato.
>  
> -- 
> Renato Callado Borges
> Lab Specialist - DFN/IF/USP
> Email: rborges at dfn.ifusp.br <mailto:rborges at dfn.ifusp.br>
> Phone: +55 11 3091 7105
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20101216/613b2191/attachment.html 


More information about the mauiusers mailing list