[torqueusers] process using more CPUs than requested

Bas van der Vlies basv at sara.nl
Wed Mar 2 01:17:08 MST 2011


On 1 mrt 2011, at 17:24, Ken Nielson wrote:

> On 02/28/2011 07:02 PM, Michael Jennings wrote:
>> On Friday, 18 February 2011, at 16:53:39 (-0700),
>> David Beer wrote:
>> 
>>> There has been talk of adding some sort of rogue process killing
>>> functionality to TORQUE. From the suggestions I've heard, it would
>>> work something like this:
>>> 
>>> 1. It would be configurable.
>>> 2. It would check which users have jobs active on the pbs_mom, and
>>> it would kill all processes from other users that shouldn't be on
>>> there.
>>> 
>>> What do you all think of such a feature?
>> We would use and appreciate such a feature, but we seem to be in the
>> minority.  :-]
>> 
>> Michael
>> 
> I would like to see this go into TORQUE as well.
> 
> We need to talk about what the policies would be about its use. For 
> example, it is easy if you know when each job starts it will have 
> exclusive access to the machine. We would do a search and destroy on all 
> user processes before we start the job. But it becomes more difficult 
> when a node is shared. It becomes even more difficult when the same user 
> has multiple jobs running on the same node.
> 
> Please give us an idea of the use cases for which this feature would be 
> needed.


Another problem could be that some daemons run as ordinary user.  So there must be excluded list of users.  For one cluster we have exclusive access to node and we kill all processes that not belong to the node in the prologue/epilogue script.


--
Bas van der Vlies
basv at sara.nl





More information about the torqueusers mailing list