[torqueusers] reported cpu time during running parallel jobs in
torque 2.1.3...
Garrick Staples
garrick at clusterresources.com
Wed Oct 18 12:26:18 MDT 2006
On Wed, Oct 18, 2006 at 05:40:40PM +0100, David Golden alleged:
> Well, perhaps in some sort of karmic revenge after on-list discussion of
> cput time accounting while back, just tried upgrading to torque 2.1.3, and it
> seems something strange is going on with _recent_ torque:
>
> The resources_used.cput number ultimately reported in
> e.g. /var/spool/pbs/server_priv/accounting/ for
> parallel jobs still seems accurate enough
>
> However, qstat -f is underreporting, even when job is in "C" state, maybe
> as if it's only reporting the job's mother superior node's processes
> cput - and I think the issue might also be mangling our maui stats...
That's peculiar.
Looking...
> Is this just some odd configuration screwup on my part, or can
> others confirm this behaviour? (Please, only if you're already using a process
> launcher that uses TM... in this case, my parallel job's processes launched
> with OSC mpiexec)
>
>
> (short test below, but days-long jobs are also exhibiting the behaviour -
> the accounting log shows what looks like the right value...)
>
> -8<-----------
>
> qstat -f
> ...
> Job Id: 64685.<myhost>
> Job_Name = parbusy.pbs
> Job_Owner = <myuser>@<myhost>
> *** resources_used.cput = 00:06:31
> resources_used.mem = 6604kb
> resources_used.vmem = 226340kb
> resources_used.walltime = 00:03:25
> job_state = C
>
>
> -8<-----------
>
> cat /var/spool/pbs/server_priv/accounting/20061018 | grep 64685
> ...
> 10/18/2006 17:24:52;D;64685.<myhost>;requestor=<myuser>@<myhost>
> 10/18/2006 17:25:07;E;64685.<myhost>;user=<myuser> group=<mygroup>
> jobname=parbusy.pbs ctime=1161188491 qtime=1161188491 etime=1161188491
> start=1161188492
> Resource_List.cput=02:00:00 Resource_List.neednodes=4:ppn=2
> Resource_List.nodect=4 Resource_List.nodes=4:ppn=2
> Resource_List.walltime=01:00:00 session=14240 end=1161188707 Exit_status=0
> **** resources_used.cput=00:26:47
> resources_used.mem=6604kb resources_used.vmem=226340kb
> resources_used.walltime=00:03:30
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list