[torqueusers] process using more CPUs than requested

Ken Nielson knielson at adaptivecomputing.com
Tue Mar 1 09:24:35 MST 2011


On 02/28/2011 07:02 PM, Michael Jennings wrote:
> On Friday, 18 February 2011, at 16:53:39 (-0700),
> David Beer wrote:
>
>> There has been talk of adding some sort of rogue process killing
>> functionality to TORQUE. From the suggestions I've heard, it would
>> work something like this:
>>
>> 1. It would be configurable.
>> 2. It would check which users have jobs active on the pbs_mom, and
>> it would kill all processes from other users that shouldn't be on
>> there.
>>
>> What do you all think of such a feature?
> We would use and appreciate such a feature, but we seem to be in the
> minority.  :-]
>
> Michael
>
I would like to see this go into TORQUE as well.

We need to talk about what the policies would be about its use. For 
example, it is easy if you know when each job starts it will have 
exclusive access to the machine. We would do a search and destroy on all 
user processes before we start the job. But it becomes more difficult 
when a node is shared. It becomes even more difficult when the same user 
has multiple jobs running on the same node.

Please give us an idea of the use cases for which this feature would be 
needed.

Thanks

Ken Nielson
Adaptive Computing


More information about the torqueusers mailing list