[torqueusers] Job exceeding memory limits
David Singleton
David.Singleton at anu.edu.au
Wed Apr 18 06:11:24 MDT 2007
Steve Young wrote:
> Thanks Dave. This is why I am wondering how torque checks an OS to
> verify how much memory is being used. I suspect that when the job is
> first being started that a lot more resources are used but after it's
> underway it evens out to expected operation. I am hoping once I can find
> out how torque does it that perhaps I can do the same from command line
> to try to find out for myself why torque thinks that it needs so much
> more memory.
Basically just add up what you find belonging to the job from ps aux.
> You bring up an interesting point... having MOM ignore resource usage
> for young processes. I didn't see anything on parameters page for MOM to
> configure this. Would you mind elaborating on how you did that? =).
> Thanks in advance,
>
Fairly simplistically. These are cutdown versions of the routines in
mom_mach.c for finding job vmem and mem respectively (I've chopped out
gory shared memory details but left gratuitous macros in).
David
static memsize_t mem_sum(job *pjob)
{
char *id="mem_sum";
memsize_t memsize=0;
int iproc;
for (iproc=0; iproc<nproc; iproc++) {
psinfo_t *pi = &proc_info[iproc];
if (!injob(pjob, pi->pr_sid)) continue;
/*
* A feeble attempt to ignore the memory use of recently forked
* processes - ignore processes less than 2 seconds old
*/
if ( time_now < (time_t) ISECS(pi->pr_start) + 2 ) continue;
if ( PRVMEM_TO_BYTES(pi->pr_size) < PROC_MEM_MAX)
memsize += PRVMEM_TO_BYTES(pi->pr_size);
}
return (memsize);
}
static memsize_t resi_sum(job *pjob)
{
char *id="resi_sum";
memsize_t resisize=0;
int iproc;
for (iproc=0; iproc<nproc; iproc++) {
psinfo_t *pi = &proc_info[iproc];
if (!injob(pjob, pi->pr_sid)) continue;
/*
* A feeble attempt to ignore the memory use of recently forked
* processes - ignore processes less than 2 seconds old
*/
if ( time_now < (time_t) ISECS(pi->pr_start) + 2 ) continue;
if (PRRSS_TO_BYTES(pi->pr_rssize) < PROC_MEM_MAX)
resisize += PRRSS_TO_BYTES(pi->pr_rssize);
}
return (resisize);
}
More information about the torqueusers
mailing list