[torqueusers] tracejob output question

Steve Snelgrove ssnelgrove at clusterresources.com
Thu Apr 10 08:55:18 MDT 2008


Chris Samuel wrote:
> Hi all,
>
> We've just been asked by one of our users:
>
>   
>> Tracejob shows the memory (and virtural memory) used by
>> a program. Is that the peak memory used? Average memory
>> used? or what. 
>>     
>
> Now we know that tracejob grabs stuff out of the PBS logs,
> but the question is still are the numbers that are recorded
> there the:
>
> 1) Maximum usage
> 2) Average usage
> 3) Usage when job ended (or last measured)
> 4) Something else
>
> Any ideas ?
>
> cheers!
> Chris
>   
I hope this is right answer...  It seems like the way that the mom 
reports info about a job is by reading the file "/proc/<job-id>/stat".  
This file contains one line which is parsed in mom_mach.c with the 
following.


  /* see stat_str[] value for mapping 'stat' format */
  if (sscanf(lastbracket,stat_str,
        &ps.state,     /* state (one of RSDZTW) */
        &ps.ppid,      /* ppid */
        &ps.pgrp,      /* pgrp */
        &ps.session,   /* session id */
        &ps.flags,     /* flags - kernel flags of the process, see the 
PF_* in <linux/sched.h> */
        &ps.utime,     /* utime - jiffies that this process has been 
scheduled in user mode */
        &ps.stime,     /* stime - jiffies that this process has been 
scheduled in kernel mode */
        &ps.cutime,    /* cutime - jiffies that this process waited-for 
children have been scheduled in user mode */
        &ps.cstime,    /* cstime - jiffies that this process waited-for 
children have been scheduled in kernel mode */
        &jstarttime,   /* starttime */
        &ps.vsize,     /* vsize */
        &ps.rss) != 12)   /* rss */
    {

This information is accumulated for all processes running on the 
system.  Since a job may have multiple processes associated with it, the 
information saved in JOB_ATR_resc_used is a sum for all processes 
matching the session ID.

So what is reported for mem is the sum of rss * page_size for all processes.

For vmem, it is the sum of vsize for all processes.

Hope this helps a little.





 


More information about the torqueusers mailing list