[torqueusers] [PATCH 0/3] use cgroup to limit the cpu and memory usage of jobs
levin li
levin108 at gmail.com
Tue Nov 20 02:02:01 MST 2012
We're using torque for a while, but it doesn't have memory isolation at present,
since cgroup is now widely used for resource isolation, I wrote this pathset for
torque to isolate cpuset and memory between jobs, we can use cgroup to replace
libcpuset, which is more complicated to use than cgroup.
With this patchset, we can use qmgr to enable/disable cgroup:
Enable:
qmgr -c "set server cgroup_enable = True"
Disable:
qmgr -c "set server cgroup_enable = False"
Before we want to use cgroup in torque, we should mount cgroup first:
mkdir /dev/torque
mount -t cgroup -o cpuset,memory torque /dev/torque
I wrote a MPI program to test this function, the job script:
-------------------------------------
#!/bin/bash
#PBS -l nodes=11:ppn=1,pmem=60
mpirun --hostfile $PBS_NODEFILE ./mpi
------------------------------------
Then we submit two jobs like this, and in every MPI process, we malloc 100M memory
and in order to make it filled by physical memory, we write 100M data to this
memory chunk, let's see the test result:
Before cgroup is enabled:
[root at vkvm050]# ps -e -o args,psr,rss
orted -mca ess env -mca ort 0 2292
./mpi 1 106504
orted -mca ess env -mca ort 0 2292
./mpi 0 106500
After cgroup is enabled:
[root at vkvm050]# ps -e -o args,psr,rss
orted -mca ess env -mca ort 1 1860
./mpi 1 46448
orted -mca ess env -mca ort 0 1860
./mpi 0 59856
Thanks,
levin
levin li (3):
pbs_server: add cgroup_enable to server attribute
resmom: add mom_cgroup.[c|h] to repo
resmom: create cgroup for jobs to limit cpu and mem usage
src/include/pbs_ifl.h | 2 +
src/include/pbs_job.h | 1 +
src/include/qmgr_svr_public.h | 1 +
src/include/server.h | 1 +
src/resmom/Makefile.am | 4 +-
src/resmom/Makefile.in | 9 +-
src/resmom/catch_child.c | 4 +
src/resmom/mom_cgroup.c | 445 +++++++++++++++++++++++++++++++++++++++++
src/resmom/mom_cgroup.h | 14 ++
src/resmom/mom_comm.c | 10 +-
src/resmom/mom_main.c | 21 ++-
src/resmom/start_exec.c | 19 ++-
src/server/job_attr_def.c | 13 ++
src/server/req_quejob.c | 7 +
src/server/svr_attr_def.c | 13 ++
15 files changed, 554 insertions(+), 10 deletions(-)
create mode 100644 src/resmom/mom_cgroup.c
create mode 100644 src/resmom/mom_cgroup.h
--
1.7.6.1
More information about the torqueusers
mailing list