[torqueusers] rss restriction behavior of pbs_mom

yoshihisa.munakata.hv at hitachi.com yoshihisa.munakata.hv at hitachi.com
Tue Dec 16 17:14:36 MST 2008


I use Torque 2.3.0 in my customer's cluster system.
A few jobs used excessive memory in our cluster system. ,  Sometimes cluster 
nodes hung up for memory shortage.
To solve this problem, I look into Torque's source code, I noticed pbs_mom don't
check rss momory size of single node job.
In the code torque-2.3.0/src/resmom/mom_main.c, The function, job_over_limit() 
call mom_over_limit(), The mom_over_limit() don't check the amount of "mem" 
resource. So mom_over_limit() successfully return into job_over_limit().
If mom_over_limit() return successfully and the job is single node job, 
the job_over_limit() also return successfully to examine_all_polled_jobs().
After all, Nothing is checked about "mem" resources in these sequences.

Is my understanding of this code correct? If so, why the pbs_mom don't check
single node job's size of "mem" resource ?

Thanks in advance
Yoshihisa Munakata

More information about the torqueusers mailing list