[torqueusers] [PATCH 0/3] use cgroup to limit the cpu and memory usage of jobs

Lukasz Flis l.flis at cyf-kr.edu.pl
Thu Nov 29 12:03:33 MST 2012


Hi All,

Levin poninted out interesting and unaddressed yet issue.

Some sites use ssh to spawn processes on the sibbling nodes. This
obviously is causing new ssh-spawned processed to run out of pbs_mom
control causing resource accounting and limitation impossible.

I think this could be easily solved by using modified PAM module for
torque.

Such module needs to be rewritten to do the following:
 * check whether incoming user has active job on the node (already present)

 * if yes: find jobid of the youngest job belonging to the user on the
node. If PAM is able to deremine if session is interactive it could ask
user to chose desired jobid to attach to

 * use tm_adopt call (libtorque) to make pbs_mom aware of new session
and processes

tm_adopt in theory should attach new ssh spawned session and proceesses
to a cpuset (2.5.12) and cgroup in later versions of torque.
Quick tests and code digging shown that tm_adopt is not cpuset aware in
2.5.12 but it should be easy to fix

Unfortunately I didn't yet have time to implement this in pam module but
maybe there is someone more experienced with PAM development who's
willing to implement this? :)



Cheers,
--
Lukasz Flis

> Levin,
> 
> Would you mind explaining why I would want to do this?  We are starting
> to use cgroups at the OS level to make sure that all user processes
> (sum) cannot blow out physical memory (as the nodes are diskless and
> have no swap).  Does your patches just make sure that cgroups are used
> to restrict memory usage from processes launched from the pbs_mom?  
> 
> What if the process launched ssh, which ran a process?  Would the
> process still be constrained under cgroups?
> 
> Thanks,
> Craig
> 
> 
> On Tue, Nov 20, 2012 at 2:02 AM, levin li <levin108 at gmail.com
> <mailto:levin108 at gmail.com>> wrote:
> 
>     We're using torque for a while, but it doesn't have memory isolation
>     at present,
>     since cgroup is now widely used for resource isolation, I wrote this
>     pathset for
>     torque to isolate cpuset and memory between jobs, we can use cgroup
>     to replace
>     libcpuset, which is more complicated to use than cgroup.
> 
>     With this patchset, we can use qmgr to enable/disable cgroup:
> 
>     Enable:
> 
>     qmgr -c "set server cgroup_enable = True"
> 
>     Disable:
> 
>     qmgr -c "set server cgroup_enable = False"
> 
>     Before we want to use cgroup in torque, we should mount cgroup first:
> 
>     mkdir /dev/torque
>     mount -t cgroup -o cpuset,memory torque /dev/torque
> 
>     I wrote a MPI program to test this function, the job script:
> 
>     -------------------------------------
>     #!/bin/bash
> 
>     #PBS -l nodes=11:ppn=1,pmem=60
> 
>     mpirun --hostfile $PBS_NODEFILE ./mpi
>     ------------------------------------
> 
>     Then we submit two jobs like this, and in every MPI process, we
>     malloc 100M memory
>     and in order to make it filled by physical memory, we write 100M
>     data to this
>     memory chunk, let's see the test result:
> 
>     Before cgroup is enabled:
> 
>     [root at vkvm050]# ps -e -o args,psr,rss
>     orted -mca ess env -mca ort   0  2292
>     ./mpi                         1 106504
>     orted -mca ess env -mca ort   0  2292
>     ./mpi                         0 106500
> 
>     After cgroup is enabled:
> 
>     [root at vkvm050]# ps -e -o args,psr,rss
>     orted -mca ess env -mca ort   1  1860
>     ./mpi                         1 46448
>     orted -mca ess env -mca ort   0  1860
>     ./mpi                         0 59856
> 
> 
>     Thanks,
> 
>     levin
> 
>     levin li (3):
>       pbs_server: add cgroup_enable to server attribute
>       resmom: add mom_cgroup.[c|h] to repo
>       resmom: create cgroup for jobs to limit cpu and mem usage
> 
>      src/include/pbs_ifl.h         |    2 +
>      src/include/pbs_job.h         |    1 +
>      src/include/qmgr_svr_public.h |    1 +
>      src/include/server.h          |    1 +
>      src/resmom/Makefile.am        |    4 +-
>      src/resmom/Makefile.in        |    9 +-
>      src/resmom/catch_child.c      |    4 +
>      src/resmom/mom_cgroup.c       |  445
>     +++++++++++++++++++++++++++++++++++++++++
>      src/resmom/mom_cgroup.h       |   14 ++
>      src/resmom/mom_comm.c         |   10 +-
>      src/resmom/mom_main.c         |   21 ++-
>      src/resmom/start_exec.c       |   19 ++-
>      src/server/job_attr_def.c     |   13 ++
>      src/server/req_quejob.c       |    7 +
>      src/server/svr_attr_def.c     |   13 ++
>      15 files changed, 554 insertions(+), 10 deletions(-)
>      create mode 100644 src/resmom/mom_cgroup.c
>      create mode 100644 src/resmom/mom_cgroup.h
> 
>     --
>     1.7.6.1
> 
>     _______________________________________________
>     torqueusers mailing list
>     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 



More information about the torqueusers mailing list