[torqueusers] process using more CPUs than requested

Troy Baer tbaer at utk.edu
Wed Mar 2 17:41:54 MST 2011


On Thu, 2011-03-03 at 11:29 +1100, Christopher Samuel wrote:
> On 19/02/11 10:53, David Beer wrote:
> > There has been talk of adding some sort of rogue
> > process killing functionality to TORQUE. From the
> > suggestions I've heard, it would work something
> > like this:
> [...]
> > What do you all think of such a feature?
> 
> I think it could be useful to some people, but I think
> you'd want to be *very* careful about how it was
> implemented.

I agree.  This is very hard to get right without some order of
inescapable container like cpusets, process aggregation groups (PAGGs),
etc.

> Sites could implement it now via the healthcheck
> support in Torque anyway.

Or by running something like pbstools' reaver [1] in epilogue.parallel,
although that would be more of a post-job cleanup thing.

	--Troy

[1] http://svn.nics.tennessee.edu/repos/pbstools/trunk/sbin/reaver



More information about the torqueusers mailing list