[torqueusers] multiple jobs: yes, no, maybe

Craig Tierney - NOAA Affiliate craig.tierney at noaa.gov
Fri Dec 21 10:39:37 MST 2012


Jack,

There is an obvious answer to this,  "It depends".  Many HPC people thing
that serial jobs are not HPC and have no business on their system.  We take
a different approach.  Most (95%) of our cycles are large parallel jobs.
 However, we have to pre- and post- process data to get it into a form that
the big parallel model does.
Most of these jobs are serial.  These serial jobs comprise about 70% of but
only 5% of the cpu time.  Now that nodes have 12, 16 or more cores, letting
a serial process on a node run is a waste.

So we pack serial jobs onto nodes, but if the job is parallel we dedicate
nodes.  This works well for us.  The trick with the serial jobs is to not
let one user affect another.  We have no swap on our nodes, so it is easy
to blow out memory.  To help guard against this we do a couple things.
 First, we require that users specify -lvmem and set how much memory their
jobs use.  Second, we use cgroups directly on the nodes to set a maximum
amount of RAM that can be used so that system processes are not affected
when users allocate too much memory.

The more nodes you have, the easier it is to be wasteful.  If you only have
5 nodes, it would make sense to try and pack jobs together.

Craig

On Thu, Dec 20, 2012 at 4:02 PM, Jack Wilkinson
<jwilkinson at stoneeagle.com>wrote:

>  We have a –small- cluster configured.  Four dedicated nodes and one node
> that shares the function of the headbox.
>
>
>
> I had a user ask a question today that I wasn’t sure how to answer.  In
> our current configuration, if we, for example, drop 12 jobs into the queue,
> the systems run the first five, as a node becomes available, the next job
> in the queue starts running until all 12 jobs have been run.  There is only
> one job on any one node at a time.
>
>
>
> The user wanted to know if it was possible to have more than one job
> running on a node at a time?  I honestly didn’t have an answer for him.
>
>
>
> My feeling is that the whole purpose of the cluster is to give the power
> of a whole node to process a job and not have to share it.
>
>
>
> Might I get some input from the crowd??
>
>
>
> Happy holidays to all!
>
> jack
>
>
>
> *Jack Wilkinson,* Programmer
>
> Services | VPay®
>
> P: 972.367-6622
>
> jwilkinson at stoneeagle.com
>
> www.stoneeagle.com
>
> www.vpayusa.com
>
>
>
> 111 W. Spring Valley Rd., #100
>
> Richardson, TX 75081
>
>
>  CONFIDENTIALITY NOTICE: This email, including any attachments, is for the
> sole use of the intended recipient(s) and may contain confidential and
> privileged information. Any unauthorized review, use, disclosure, or
> distribution is prohibited. If you received this email and are not the
> intended recipient, please inform the sender by email reply and destroy all
> copies of the original message.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121221/74dcff0a/attachment-0001.html 


More information about the torqueusers mailing list