[torqueusers] question on memory allocation

Lippert, Kenneth B. Kenneth.Lippert at alcoa.com
Fri Oct 27 12:03:41 MDT 2006


Thanks, I think this did the trick.. I set both MEM and PROCS to
UTILIZED.

One problem though, I thought that I was able to kill maui and restart
it without affecting any jobs already running in the cluster.  I blew
away about 12 jobs each with many hours of computation already
completed.   I scrambled to restart them (from the beginning - no
checkpointing) but still my boss is not happy.

I still see that my 8-way machine is only taking 5 jobs, even though
there appears to be memory space left.  There is another process running
on that machine (not under Torque control), but there should still be
room.  How can I see exactly what memory value Maui is looking at to
make it's decision?  Is it in the pbsnodes  output?

Thanks again.

-kenn lippert




-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Sam Rash
Sent: Thursday, October 26, 2006 4:38 PM
To: Lippert, Kenneth B.; torqueusers at supercluster.org
Subject: RE: [torqueusers] question on memory allocation

There is a flag or param about resources..

Dedicated, utilized, or both

(former being what you say it'll take, latter what it's actually using)

NODEAVAILABILITYPOLICY

http://www.clusterresources.com/products/maui/docs/a.fparameters.shtml

and another around that I think

also:

This reminds me of a request wrt torque and enforcing resource
requirements.
Would a future feature be to at least globally, if not per resource
(walltime, memory, etc) say that when I say a job has needs 1.4gb of mem
and
a walltime of 2 days, that the 1.4gb is entirely soft meaning just use
it in
scheduling (ie, each node has 3gb, so only 2 jobs of this type should go
there), but enforce walltime of 2 days (no way on earth our jobs should
take
more than that, more like 12 hours max).

Last we (Garrick) and I talked about this, the code for enforcing these
resource 'limits' was only enabled if certain ones were defined.  Memory
did
not enable it, but walltime did.  So we currently only specify walltime,
but
sometimes get hung jobs due to transient issues and would love to set a
max
walltime of like 12-24 hrs, but if a process w/1.4gb mem setting hits
2.5,
let it swap...(but still don't shoot yourself in the foot by knowingly
putting 3 x 1.4gb procs on a 3 or 4 proc box)

Has this changed since 2.1.2?  Any plans?

Garrick?  Anyone?

Thx,

-sr


Sam Rash
srash at yahoo-inc.com
408-349-7312
vertigosr37

-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Lippert,
Kenneth
B.
Sent: Thursday, October 26, 2006 11:24 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] question on memory allocation

Quick question:

 I put memory requirements ala' "-l mem=1gb" in all of my job scripts.

When Torque/Maui is deciding where to send the job, does it look at the
ACTUAL free physical memory available on the nodes, or does it look at
the memory on each node minus the sum of the  "mem" values of the jobs
already running on that node ?  (all the nodes have several np's, from 2
to 8)

If my memory estimates were exact it wouldn't matter, but they aren't.
I have to estimate a little high, so if the algorithm is the latter,
Torque could be thinking a node was "full" when it really wasn't.

Looking at the documentation, I am thinking it is the first way, but I
want to be sure.

Thanks very much.

-k
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list