[torqueusers] torque 2.3.3 not reporting CPU time used

Caird, Andrew J acaird at umich.edu
Wed Sep 3 11:22:26 MDT 2008


Hello all,

We've seen a few cases where the pbs_mom in Torque 2.3.3 doesn't report the CPU time.

With the logging turned up to 8 for pbs_mom, I see:

09/03/2008 13:03:44;0080;   pbs_mom;Svr;mom_get_sample;proc_array load started
09/03/2008 13:03:44;0080;   pbs_mom;n/a;mom_get_sample;proc_array loaded - nproc=134
09/03/2008 13:03:44;0080;   pbs_mom;n/a;cput_sum;proc_array loop start - jobid = 1437922.nyx.engin.umich.edu
09/03/2008 13:03:44;0002;   pbs_mom;n/a;cput_sum;cput_sum: session=28083 pid=28083 cputime=0 (cputfactor=1.000000)
09/03/2008 13:03:44;0002;   pbs_mom;n/a;cput_sum;cput_sum: session=28083 pid=28226 cputime=0 (cputfactor=1.000000)
09/03/2008 13:03:44;0080;   pbs_mom;n/a;mem_sum;proc_array loop start - jobid = 1437922.nyx.engin.umich.edu
09/03/2008 13:03:44;0080;   pbs_mom;n/a;resi_sum;proc_array loop start - jobid = 1437922.nyx.engin.umich.edu
09/03/2008 13:03:44;0008;   pbs_mom;Req;send_sisters;sending command POLL_JOB for job 1437922.nyx.engin.umich.edu (7)

This is for a 4-task job on one node with no other tasks on this node - there are no other MOMs or jobs involved besides this one.


[root at node378 ~]# ps -ef | egrep PPID\|pbs_mom\|28083\|28226
UID       PID  PPID  C STIME TTY          TIME CMD
root     3999     1  0 Aug20 ?        00:02:16 /usr/local/torque/sbin/pbs_mom -p
user1   28083  3999  0 Aug27 ?        00:00:00 -sh
user1   28226 28083  0 Aug27 ?        00:00:00 /bin/sh /var/spool/PBS/mom_priv/jobs/1437922.nyx.engin.umich.edu.SC
user1   28227 28226 99 Aug27 ?        7-01:43:28 ./tortusorMFPA6.out

The proc_array seems to be looking at 2 PIDs (28083 and 28226 in this case) but not looking at the third related PID (28227, the child of 28226), which is the process that has all of the CPU time.

Has anyone else noticed this?  Am I even reporting useful information?

Thanks.
--andy




More information about the torqueusers mailing list