[torqueusers] cpu_time & wall_time, wrong values?

Arnau Bria arnaubria at pic.es
Mon Jun 22 07:46:43 MDT 2009


Hi,

Our local accounting script found some incoherence in our torque jobs.
Seems that some jobs spend more cpu time than wall time, and some spend
all walltime and do not have 271 exit status. 

Some examples:

One with more cputime than walltime:

06/21/2009 23:57:39;E;5230774.pbs02.pic.es;user=cmprd007 group=cmprd 
jobname=STDIN queue=gmedium64 ctime=1245599133 qtime=1245599133 
etime=1245599133 start=1245599186 owner=cmprd007 at ce07.pic.es 
exec_host=td072.pic.es/2 Resource_List.cput=12:00:00 
Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 
Resource_List.walltime=24:00:00 session=19897 end=1245621459 
Exit_status=271 resources_used.cput=12:00:23 
resources_used.mem=1317696kb resources_used.vmem=2533164kb 
resources_used.walltime=11:59:13

cputime exceed but exit status =143.

06/21/2009 17:13:03;E;5227211.pbs02.pic.es;user=cmprd007 group=cmprd 
jobname=STDIN queue=gmedium64 ctime=1245572919 qtime=1245572919 
etime=1245572919 start=1245572927 owner=cmprd007 at ce07.pic.es 
exec_host=td135.pic.es/4 Resource_List.cput=12:00:00 
Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 
Resource_List.walltime=24:00:00 session=32726 end=1245597183 
Exit_status=143 resources_used.cput=12:00:52 
resources_used.mem=1235256kb resources_used.vmem=2489196kb 
resources_used.walltime=11:57:51

and exit_status=0

06/16/2009 17:53:07;E;5157092.pbs02.pic.es;user=cmprd001 group=cmprd 
jobname=STDIN queue=gmedium64 ctime=1245143208 qtime=1245143208 
etime=1245143208 start=1245143284 owner=cmprd001 at ce07.pic.es 
exec_host=td101.pic.es/0 Resource_List.cput=12:00:00 
Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 
Resource_List.walltime=24:00:00 session=28461 end=1245167587 
Exit_status=0 resources_used.cput=12:00:59 resources_used.mem=1222424kb 
resources_used.vmem=2524176kb resources_used.walltime=11:59:14


Our cpumult and wallmult are:
$cputmult 1.5873
$wallmult 1.5873


maybe too many decimals?
anyone faced this problem before?

TIA,
Arnau


More information about the torqueusers mailing list