[torqueusers] usage of a cluster
Tim
timlee126 at yahoo.com
Tue Feb 2 11:59:44 MST 2010
Thanks Chris!
--- On Mon, 2/1/10, Chris Samuel <chris at csamuel.org> wrote:
> From: Chris Samuel <chris at csamuel.org>
> Subject: Re: [torqueusers] usage of a cluster
> To: torqueusers at supercluster.org
> Date: Monday, February 1, 2010, 7:17 PM
> Tim wrote:
>
> > Hi,
>
> Hi Tim,
>
> Can I suggest that many of these questions would be great
> to be asked on the Beowulf mailing list, which is all
> about
> Linux clusters in general. http://www.beowulf.org/
>
> This list is mainly for questions about the Torque queuing
> system. That said...
>
> > 1. I noticed that there is no swap for any node of
> > our cluster. Is it normal for most clusters?
>
> Not really, people often argue that running without swap
> is good but it does mean that the kernel does not have the
> freedom to page dirty file-backed pages out to swap under
> memory pressure, it has to evict them out to the files
> which
> is (apparently) slower than paging them.
>
> That's especially important if they're temporary files,
> they
> could just get unlinked before those pages need to be
> evicted..
>
> > I am runing my job on a node. What will happen to my
> job if
> > the memory is used up? Do I have no other
> choice but to kill
> > my job?
>
> Usually the kernel will kill that process for you..
>
> > Update:
> > The node finally runs out of memory and does not
> respond.
>
> Oops :-)
>
> > I emailed it to the administrator and after a while I
> found
> > the node is rebooted without affecting other nodes.
> Feel lucky
> > my job did not bring down the whole cluster.
>
> It shouldn't bring down the cluster, but if you were
> sharing the
> node with other users jobs it would have killed them!
>
> > Is using up memory one kind of behaviour what
> administrator
> > dislikes from the users?
>
> Very much so!
>
> > 2. My jobs are sumitted by Torque. Will Torque make
> the
> > newly submitted jobs waiting if there are not enough
> > resources for run them?
>
> Normally no - the scheduler decides what to run and
> usually
> sites do not set up policies that overcommit the resources
> available.
>
> > I wonder if each user still has to check the usage
> status
> > of the cluster before deciding to submit new jobs? How
> to
> > exactly?
>
> No, the whole point of a queuing system is to manage a
> situation where demand outstrips supply and so it has
> to make the decisions on what to do, not you.
>
> > By "qstat" I can see the jobs that are running and
> > by "qstat -q" I can see how many jobs in each queue
> > are running.
>
> Correct - and if your site uses Maui or Moab as the
> scheduler you can use "showq" to get even more info.
>
> > But how can I find info about the usage percentage
> > of all nodes and cores and memory to get a big
> picture
> > and decide if I better not to submit my jobs but
> wait
> > for more resources become available?
>
> Firstly that's a site specific query - the tools they
> use for monitoring nodes varies enormously.
>
> Secondly the whole point of a queuing system is so you
> don't have to worry about that - you submit it into the
> queue and at some point (hopefully) it will run when the
> resources are available.
>
> cheers!
> Chris
> --
> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
More information about the torqueusers
mailing list