[torqueusers] reported cpu time during running parallel jobs in
torque 2.1.3...
David Golden
dgolden at cp.dias.ie
Wed Oct 18 10:40:40 MDT 2006
Well, perhaps in some sort of karmic revenge after on-list discussion of
cput time accounting while back, just tried upgrading to torque 2.1.3, and it
seems something strange is going on with _recent_ torque:
The resources_used.cput number ultimately reported in
e.g. /var/spool/pbs/server_priv/accounting/ for
parallel jobs still seems accurate enough
However, qstat -f is underreporting, even when job is in "C" state, maybe
as if it's only reporting the job's mother superior node's processes
cput - and I think the issue might also be mangling our maui stats...
Is this just some odd configuration screwup on my part, or can
others confirm this behaviour? (Please, only if you're already using a process
launcher that uses TM... in this case, my parallel job's processes launched
with OSC mpiexec)
(short test below, but days-long jobs are also exhibiting the behaviour -
the accounting log shows what looks like the right value...)
-8<-----------
qstat -f
...
Job Id: 64685.<myhost>
Job_Name = parbusy.pbs
Job_Owner = <myuser>@<myhost>
*** resources_used.cput = 00:06:31
resources_used.mem = 6604kb
resources_used.vmem = 226340kb
resources_used.walltime = 00:03:25
job_state = C
-8<-----------
cat /var/spool/pbs/server_priv/accounting/20061018 | grep 64685
...
10/18/2006 17:24:52;D;64685.<myhost>;requestor=<myuser>@<myhost>
10/18/2006 17:25:07;E;64685.<myhost>;user=<myuser> group=<mygroup>
jobname=parbusy.pbs ctime=1161188491 qtime=1161188491 etime=1161188491
start=1161188492
Resource_List.cput=02:00:00 Resource_List.neednodes=4:ppn=2
Resource_List.nodect=4 Resource_List.nodes=4:ppn=2
Resource_List.walltime=01:00:00 session=14240 end=1161188707 Exit_status=0
**** resources_used.cput=00:26:47
resources_used.mem=6604kb resources_used.vmem=226340kb
resources_used.walltime=00:03:30
More information about the torqueusers
mailing list