[torqueusers] usage of a cluster

Tim timlee126 at yahoo.com
Tue Feb 2 11:59:44 MST 2010


Thanks Chris! 

--- On Mon, 2/1/10, Chris Samuel <chris at csamuel.org> wrote:

> From: Chris Samuel <chris at csamuel.org>
> Subject: Re: [torqueusers] usage of a cluster
> To: torqueusers at supercluster.org
> Date: Monday, February 1, 2010, 7:17 PM
> Tim wrote:
> 
> > Hi,
> 
> Hi Tim,
> 
> Can I suggest that many of these questions would be great
> to be asked on the Beowulf mailing list, which is all
> about
> Linux clusters in general.  http://www.beowulf.org/
> 
> This list is mainly for questions about the Torque queuing
> system.   That said...
> 
> > 1. I noticed that there is no swap for any node of
>  > our cluster. Is it normal for most clusters?
> 
> Not really, people often argue that running without swap
> is good but it does mean that the kernel does not have the
> freedom to page dirty file-backed pages out to swap under
> memory pressure, it has to evict them out to the files
> which
> is (apparently) slower than paging them.
> 
> That's especially important if they're temporary files,
> they
> could just get unlinked before those pages need to be
> evicted..
> 
> > I am runing my job on a node. What will happen to my
> job if
>  > the memory is used up?  Do I have no other
> choice but to kill
>  > my job?
> 
> Usually the kernel will kill that process for you..
> 
> > Update:
> > The node finally runs out of memory and does not
> respond.
> 
> Oops :-)
> 
>  > I emailed it to the administrator and after a while I
> found
>  > the node is rebooted without affecting other nodes.
> Feel lucky
>  > my job did not bring down the whole cluster.
> 
> It shouldn't bring down the cluster, but if you were
> sharing the
> node with other users jobs it would have killed them!
> 
>  > Is using up memory one kind of behaviour what
> administrator
>  > dislikes from the users?
> 
> Very much so!
> 
> > 2. My jobs are sumitted by Torque. Will Torque make
> the
>  > newly submitted jobs waiting if there are not enough
>  > resources for run them?
> 
> Normally no - the scheduler decides what to run and
> usually
> sites do not set up policies that overcommit the resources
> available.
> 
> > I wonder if each user still has to check the usage
> status
> > of the cluster before deciding to submit new jobs? How
> to
> > exactly?
> 
> No, the whole point of a queuing system is to manage a
> situation where demand outstrips supply and so it has
> to make the decisions on what to do, not you.
> 
> > By "qstat" I can see the jobs that are running and
>  > by "qstat -q" I can see how many jobs in each queue
>  > are running.
> 
> Correct - and if your site uses Maui or Moab as the
> scheduler you can use "showq" to get even more info.
> 
> > But how can I find info about the usage percentage
>  > of all nodes and cores and memory to get a big
> picture
>  > and decide if I better not to submit my jobs but
> wait
>  > for more resources become available?
> 
> Firstly that's a site specific query - the tools they
> use for monitoring nodes varies enormously.
> 
> Secondly the whole point of a queuing system is so you
> don't have to worry about that - you submit it into the
> queue and at some point (hopefully) it will run when the
> resources are available.
> 
> cheers!
> Chris
> -- 
>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 


      


More information about the torqueusers mailing list