[torqueusers] Preventing compute node "starvation"

Dave Ulrick d-ulrick at comcast.net
Thu Sep 19 09:24:21 MDT 2013

On Wed, 18 Sep 2013, David Beer wrote:

> This might involve a bit of manual work, but you could do this with a root
> cpuset. The idea in a nutshell is to reserve a core or two for the OS,
> nagios, pbs_mom daemon, etc. and let jobs use the rest. The way to do this
> on a 12 core node would be to enable cpusets in TORQUE and in the nodes
> file say that it only has 11 cores (or 10 if you want to reserve 2 for
> these things). Then, you can either trust the OS to load balance these
> other processes to the unused core or you can manually make sure that these
> processes run under that cpuset.
> As far as whether or not this is needed - pbs_mom should use a minimal
> amount of resources once a job is actually active. The amount it uses can
> be greater if it is a larger node and you are running lots of small jobs on
> the node, but even so it shouldn't be a huge amount of resources. The only
> things pbs_mom should need to do for a node that is already filled with
> jobs is send a status to pbs_server every 45 seconds by default (this can
> be configured) and respond to pbs_server's poll requests every 45 seconds
> (this is also configurable). There will be one poll request per job. I
> don't know how much cpu nagios uses, but typically people haven't had to
> use this solution except on large-scale numa systems (usually > 1000 cores)
> which have a little better support for doing it and are often running many
> more jobs per node.

I like this idea, but I doubt my users would like the idea of giving up 
cores for the sake of node stability. We use Moab's "JOBNODEMATCHPOLICY 
EXACTNODE" facility to give a job exclusive use of a node, so our users 
are used to being able to use _everything_ on the node including all 12 
cores. I think a computational job that's running on a node that's able to 
provide guaranteed access to resources for crucial system tasks is going 
to perform better than a job that's loading the node so heavily that I/O, 
network traffic, etc., is hindered, but my users don't see that (yet). 
Perhaps we're going to have to suffer with a job that persistently kills 
nodes before I'll be able to sell giving over cores to system processes. 
Too bad!

Dave Ulrick
d-ulrick at comcast.net

More information about the torqueusers mailing list