[torqueusers] reported cpu time during running parallel jobs in torque 2.1.3...

David Golden dgolden at cp.dias.ie
Wed Oct 18 10:40:40 MDT 2006

Well, perhaps in some sort of karmic revenge after on-list discussion of 
cput time accounting while back, just tried upgrading to torque 2.1.3, and it 
seems something strange is going on with _recent_ torque:

The resources_used.cput number ultimately reported  in 
e.g. /var/spool/pbs/server_priv/accounting/ for 
parallel jobs still seems accurate enough

However, qstat -f is underreporting, even when job is in "C" state, maybe  
as if it's only reporting the job's mother superior node's processes 
cput - and I think the issue might also be mangling our maui stats...

Is this just some odd configuration screwup on my part, or can 
others confirm this behaviour? (Please, only if you're already using a process 
launcher that uses TM... in this case, my parallel job's processes launched 
with OSC mpiexec)

(short test below, but days-long jobs are also exhibiting the behaviour -
the accounting log shows what looks like the right value...)


qstat -f 
Job Id: 64685.<myhost>
    Job_Name = parbusy.pbs
    Job_Owner = <myuser>@<myhost>
***    resources_used.cput = 00:06:31 
    resources_used.mem = 6604kb
    resources_used.vmem = 226340kb
    resources_used.walltime = 00:03:25
    job_state = C


cat /var/spool/pbs/server_priv/accounting/20061018 | grep 64685
10/18/2006 17:24:52;D;64685.<myhost>;requestor=<myuser>@<myhost>
10/18/2006 17:25:07;E;64685.<myhost>;user=<myuser> group=<mygroup> 
jobname=parbusy.pbs ctime=1161188491 qtime=1161188491 etime=1161188491 
Resource_List.cput=02:00:00 Resource_List.neednodes=4:ppn=2 
Resource_List.nodect=4 Resource_List.nodes=4:ppn=2 
Resource_List.walltime=01:00:00 session=14240 end=1161188707 Exit_status=0 
**** resources_used.cput=00:26:47 
resources_used.mem=6604kb resources_used.vmem=226340kb 

More information about the torqueusers mailing list