[torqueusers] [PATCH 0/3] use cgroup to limit the cpu and memory usage of jobs

levin li levin108 at gmail.com
Tue Nov 20 02:02:01 MST 2012


We're using torque for a while, but it doesn't have memory isolation at present,
since cgroup is now widely used for resource isolation, I wrote this pathset for
torque to isolate cpuset and memory between jobs, we can use cgroup to replace
libcpuset, which is more complicated to use than cgroup.

With this patchset, we can use qmgr to enable/disable cgroup:

Enable:

qmgr -c "set server cgroup_enable = True"

Disable:

qmgr -c "set server cgroup_enable = False"

Before we want to use cgroup in torque, we should mount cgroup first:

mkdir /dev/torque
mount -t cgroup -o cpuset,memory torque /dev/torque

I wrote a MPI program to test this function, the job script:

-------------------------------------
#!/bin/bash
 
#PBS -l nodes=11:ppn=1,pmem=60
 
mpirun --hostfile $PBS_NODEFILE ./mpi
------------------------------------

Then we submit two jobs like this, and in every MPI process, we malloc 100M memory
and in order to make it filled by physical memory, we write 100M data to this 
memory chunk, let's see the test result:

Before cgroup is enabled:

[root at vkvm050]# ps -e -o args,psr,rss
orted -mca ess env -mca ort   0  2292
./mpi                         1 106504
orted -mca ess env -mca ort   0  2292
./mpi                         0 106500

After cgroup is enabled:

[root at vkvm050]# ps -e -o args,psr,rss
orted -mca ess env -mca ort   1  1860
./mpi                         1 46448
orted -mca ess env -mca ort   0  1860
./mpi                         0 59856


Thanks,

levin

levin li (3):
  pbs_server: add cgroup_enable to server attribute
  resmom: add mom_cgroup.[c|h] to repo
  resmom: create cgroup for jobs to limit cpu and mem usage

 src/include/pbs_ifl.h         |    2 +
 src/include/pbs_job.h         |    1 +
 src/include/qmgr_svr_public.h |    1 +
 src/include/server.h          |    1 +
 src/resmom/Makefile.am        |    4 +-
 src/resmom/Makefile.in        |    9 +-
 src/resmom/catch_child.c      |    4 +
 src/resmom/mom_cgroup.c       |  445 +++++++++++++++++++++++++++++++++++++++++
 src/resmom/mom_cgroup.h       |   14 ++
 src/resmom/mom_comm.c         |   10 +-
 src/resmom/mom_main.c         |   21 ++-
 src/resmom/start_exec.c       |   19 ++-
 src/server/job_attr_def.c     |   13 ++
 src/server/req_quejob.c       |    7 +
 src/server/svr_attr_def.c     |   13 ++
 15 files changed, 554 insertions(+), 10 deletions(-)
 create mode 100644 src/resmom/mom_cgroup.c
 create mode 100644 src/resmom/mom_cgroup.h

-- 
1.7.6.1



More information about the torqueusers mailing list