[torqueusers] [PATCH 0/3] use cgroup to limit the cpu and memory usage of jobs

Craig Tierney - NOAA Affiliate craig.tierney at noaa.gov
Thu Nov 29 11:23:04 MST 2012


Levin,

Would you mind explaining why I would want to do this?  We are starting to
use cgroups at the OS level to make sure that all user processes (sum)
cannot blow out physical memory (as the nodes are diskless and have no
swap).  Does your patches just make sure that cgroups are used to restrict
memory usage from processes launched from the pbs_mom?

What if the process launched ssh, which ran a process?  Would the process
still be constrained under cgroups?

Thanks,
Craig


On Tue, Nov 20, 2012 at 2:02 AM, levin li <levin108 at gmail.com> wrote:

> We're using torque for a while, but it doesn't have memory isolation at
> present,
> since cgroup is now widely used for resource isolation, I wrote this
> pathset for
> torque to isolate cpuset and memory between jobs, we can use cgroup to
> replace
> libcpuset, which is more complicated to use than cgroup.
>
> With this patchset, we can use qmgr to enable/disable cgroup:
>
> Enable:
>
> qmgr -c "set server cgroup_enable = True"
>
> Disable:
>
> qmgr -c "set server cgroup_enable = False"
>
> Before we want to use cgroup in torque, we should mount cgroup first:
>
> mkdir /dev/torque
> mount -t cgroup -o cpuset,memory torque /dev/torque
>
> I wrote a MPI program to test this function, the job script:
>
> -------------------------------------
> #!/bin/bash
>
> #PBS -l nodes=11:ppn=1,pmem=60
>
> mpirun --hostfile $PBS_NODEFILE ./mpi
> ------------------------------------
>
> Then we submit two jobs like this, and in every MPI process, we malloc
> 100M memory
> and in order to make it filled by physical memory, we write 100M data to
> this
> memory chunk, let's see the test result:
>
> Before cgroup is enabled:
>
> [root at vkvm050]# ps -e -o args,psr,rss
> orted -mca ess env -mca ort   0  2292
> ./mpi                         1 106504
> orted -mca ess env -mca ort   0  2292
> ./mpi                         0 106500
>
> After cgroup is enabled:
>
> [root at vkvm050]# ps -e -o args,psr,rss
> orted -mca ess env -mca ort   1  1860
> ./mpi                         1 46448
> orted -mca ess env -mca ort   0  1860
> ./mpi                         0 59856
>
>
> Thanks,
>
> levin
>
> levin li (3):
>   pbs_server: add cgroup_enable to server attribute
>   resmom: add mom_cgroup.[c|h] to repo
>   resmom: create cgroup for jobs to limit cpu and mem usage
>
>  src/include/pbs_ifl.h         |    2 +
>  src/include/pbs_job.h         |    1 +
>  src/include/qmgr_svr_public.h |    1 +
>  src/include/server.h          |    1 +
>  src/resmom/Makefile.am        |    4 +-
>  src/resmom/Makefile.in        |    9 +-
>  src/resmom/catch_child.c      |    4 +
>  src/resmom/mom_cgroup.c       |  445
> +++++++++++++++++++++++++++++++++++++++++
>  src/resmom/mom_cgroup.h       |   14 ++
>  src/resmom/mom_comm.c         |   10 +-
>  src/resmom/mom_main.c         |   21 ++-
>  src/resmom/start_exec.c       |   19 ++-
>  src/server/job_attr_def.c     |   13 ++
>  src/server/req_quejob.c       |    7 +
>  src/server/svr_attr_def.c     |   13 ++
>  15 files changed, 554 insertions(+), 10 deletions(-)
>  create mode 100644 src/resmom/mom_cgroup.c
>  create mode 100644 src/resmom/mom_cgroup.h
>
> --
> 1.7.6.1
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121129/2c487b74/attachment.html 


More information about the torqueusers mailing list