[torqueusers] usage of a cluster
timlee126 at yahoo.com
Tue Feb 2 11:59:44 MST 2010
--- On Mon, 2/1/10, Chris Samuel <chris at csamuel.org> wrote:
> From: Chris Samuel <chris at csamuel.org>
> Subject: Re: [torqueusers] usage of a cluster
> To: torqueusers at supercluster.org
> Date: Monday, February 1, 2010, 7:17 PM
> Tim wrote:
> > Hi,
> Hi Tim,
> Can I suggest that many of these questions would be great
> to be asked on the Beowulf mailing list, which is all
> Linux clusters in general. http://www.beowulf.org/
> This list is mainly for questions about the Torque queuing
> system. That said...
> > 1. I noticed that there is no swap for any node of
> > our cluster. Is it normal for most clusters?
> Not really, people often argue that running without swap
> is good but it does mean that the kernel does not have the
> freedom to page dirty file-backed pages out to swap under
> memory pressure, it has to evict them out to the files
> is (apparently) slower than paging them.
> That's especially important if they're temporary files,
> could just get unlinked before those pages need to be
> > I am runing my job on a node. What will happen to my
> job if
> > the memory is used up? Do I have no other
> choice but to kill
> > my job?
> Usually the kernel will kill that process for you..
> > Update:
> > The node finally runs out of memory and does not
> Oops :-)
> > I emailed it to the administrator and after a while I
> > the node is rebooted without affecting other nodes.
> Feel lucky
> > my job did not bring down the whole cluster.
> It shouldn't bring down the cluster, but if you were
> sharing the
> node with other users jobs it would have killed them!
> > Is using up memory one kind of behaviour what
> > dislikes from the users?
> Very much so!
> > 2. My jobs are sumitted by Torque. Will Torque make
> > newly submitted jobs waiting if there are not enough
> > resources for run them?
> Normally no - the scheduler decides what to run and
> sites do not set up policies that overcommit the resources
> > I wonder if each user still has to check the usage
> > of the cluster before deciding to submit new jobs? How
> > exactly?
> No, the whole point of a queuing system is to manage a
> situation where demand outstrips supply and so it has
> to make the decisions on what to do, not you.
> > By "qstat" I can see the jobs that are running and
> > by "qstat -q" I can see how many jobs in each queue
> > are running.
> Correct - and if your site uses Maui or Moab as the
> scheduler you can use "showq" to get even more info.
> > But how can I find info about the usage percentage
> > of all nodes and cores and memory to get a big
> > and decide if I better not to submit my jobs but
> > for more resources become available?
> Firstly that's a site specific query - the tools they
> use for monitoring nodes varies enormously.
> Secondly the whole point of a queuing system is so you
> don't have to worry about that - you submit it into the
> queue and at some point (hopefully) it will run when the
> resources are available.
> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers