[torqueusers] Job exceeding memory limits

David Singleton David.Singleton at anu.edu.au
Wed Apr 18 06:11:24 MDT 2007



Steve Young wrote:
> 	Thanks Dave. This is why I am wondering how torque checks an OS to
> verify how much memory is being used. I suspect that when the job is
> first being started that a lot more resources are used but after it's
> underway it evens out to expected operation. I am hoping once I can find
> out how torque does it that perhaps I can do the same from command line
> to try to find out for myself why torque thinks that it needs so much
> more memory. 

Basically just add up what you find belonging to the job from ps aux.

> 	You bring up an interesting point... having MOM ignore resource usage
> for young processes. I didn't see anything on parameters page for MOM to
> configure this. Would you mind elaborating on how you did that? =).
> Thanks in advance,
> 

Fairly simplistically.  These are cutdown versions of the routines in
mom_mach.c for finding job vmem and mem respectively (I've chopped out
gory shared memory details but left gratuitous macros in).

David


static memsize_t mem_sum(job *pjob)
{
         char       *id="mem_sum";
         memsize_t  memsize=0;
         int        iproc;

         for (iproc=0; iproc<nproc; iproc++)  {
                 psinfo_t *pi = &proc_info[iproc];

                 if (!injob(pjob, pi->pr_sid))  continue;

                 /*
		 * A feeble attempt to ignore the memory use of recently forked
		 * processes - ignore processes less than 2 seconds old
                  */
                 if ( time_now < (time_t) ISECS(pi->pr_start) + 2 )  continue;

                 if ( PRVMEM_TO_BYTES(pi->pr_size) < PROC_MEM_MAX)
                         memsize += PRVMEM_TO_BYTES(pi->pr_size);

         }

         return (memsize);
}


static memsize_t resi_sum(job *pjob)
{
         char  *id="resi_sum";
         memsize_t  resisize=0;
         int  iproc;

         for (iproc=0; iproc<nproc; iproc++) {
                 psinfo_t *pi = &proc_info[iproc];

                 if (!injob(pjob, pi->pr_sid))  continue;

                 /*
		 * A feeble attempt to ignore the memory use of recently forked
		 * processes - ignore processes less than 2 seconds old
                  */
                 if ( time_now < (time_t) ISECS(pi->pr_start) + 2 )  continue;

                 if (PRRSS_TO_BYTES(pi->pr_rssize) < PROC_MEM_MAX)
                         resisize += PRRSS_TO_BYTES(pi->pr_rssize);
         }

         return (resisize);
}




More information about the torqueusers mailing list