[torqueusers] [PATCH 0/3] use cgroup to limit the cpu and memory usage of jobs

André Gemünd andre.gemuend at scai.fraunhofer.de
Fri Nov 30 00:32:34 MST 2012


that PAM approach sounds very interesting. 
There are some MPI implementations that don't support TM API, 
so afaik their is no real choice besides using SSH to launch 
the siblings (I'm open to suggestions if that is wrong).

Instead of being interactive I'd rather prefer checking for 
the same jobid on the siblings that the job has on the mother
superior, but that is an implementation detail.

I wouldn't know how to start, but maybe we can collaborate?


----- Ursprüngliche Mail -----
> Some sites use ssh to spawn processes on the sibbling nodes. This
> obviously is causing new ssh-spawned processed to run out of pbs_mom
> control causing resource accounting and limitation impossible.
> I think this could be easily solved by using modified PAM module for
> torque.
> Such module needs to be rewritten to do the following:
>  * check whether incoming user has active job on the node (already
>  present)
>  * if yes: find jobid of the youngest job belonging to the user on
>  the
> node. If PAM is able to deremine if session is interactive it could
> ask
> user to chose desired jobid to attach to
>  * use tm_adopt call (libtorque) to make pbs_mom aware of new session
> and processes
> tm_adopt in theory should attach new ssh spawned session and
> proceesses
> to a cpuset (2.5.12) and cgroup in later versions of torque.
> Quick tests and code digging shown that tm_adopt is not cpuset aware
> in
> 2.5.12 but it should be easy to fix
> Unfortunately I didn't yet have time to implement this in pam module
> but
> maybe there is someone more experienced with PAM development who's
> willing to implement this? :)

André Gemünd
Fraunhofer-Institute for Algorithms and Scientific Computing
andre.gemuend at scai.fraunhofer.de
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend

More information about the torqueusers mailing list