[torqueusers] reported cpu time during running parallel jobs in torque 2.1.3...

Garrick Staples garrick at clusterresources.com
Wed Oct 18 12:26:18 MDT 2006


On Wed, Oct 18, 2006 at 05:40:40PM +0100, David Golden alleged:
> Well, perhaps in some sort of karmic revenge after on-list discussion of 
> cput time accounting while back, just tried upgrading to torque 2.1.3, and it 
> seems something strange is going on with _recent_ torque:
> 
> The resources_used.cput number ultimately reported  in 
> e.g. /var/spool/pbs/server_priv/accounting/ for 
> parallel jobs still seems accurate enough
> 
> However, qstat -f is underreporting, even when job is in "C" state, maybe  
> as if it's only reporting the job's mother superior node's processes 
> cput - and I think the issue might also be mangling our maui stats...

That's peculiar.

Looking...


 
> Is this just some odd configuration screwup on my part, or can 
> others confirm this behaviour? (Please, only if you're already using a process 
> launcher that uses TM... in this case, my parallel job's processes launched 
> with OSC mpiexec)
> 
> 
> (short test below, but days-long jobs are also exhibiting the behaviour -
> the accounting log shows what looks like the right value...)
> 
> -8<-----------
> 
> qstat -f 
> ...
> Job Id: 64685.<myhost>
>     Job_Name = parbusy.pbs
>     Job_Owner = <myuser>@<myhost>
> ***    resources_used.cput = 00:06:31 
>     resources_used.mem = 6604kb
>     resources_used.vmem = 226340kb
>     resources_used.walltime = 00:03:25
>     job_state = C
> 
> 
> -8<-----------
> 
> cat /var/spool/pbs/server_priv/accounting/20061018 | grep 64685
> ...
> 10/18/2006 17:24:52;D;64685.<myhost>;requestor=<myuser>@<myhost>
> 10/18/2006 17:25:07;E;64685.<myhost>;user=<myuser> group=<mygroup> 
> jobname=parbusy.pbs ctime=1161188491 qtime=1161188491 etime=1161188491 
> start=1161188492 
> Resource_List.cput=02:00:00 Resource_List.neednodes=4:ppn=2 
> Resource_List.nodect=4 Resource_List.nodes=4:ppn=2 
> Resource_List.walltime=01:00:00 session=14240 end=1161188707 Exit_status=0 
> **** resources_used.cput=00:26:47 
> resources_used.mem=6604kb resources_used.vmem=226340kb 
> resources_used.walltime=00:03:30
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list