[torqueusers] Duplicate charges created by maui

Eva Hocks hocks at sdsc.edu
Mon Oct 21 17:06:03 MDT 2013





Occasionally (not for every job)  I see dublicate job charges in the
middle of a job run and at the end.  Any idea where the first
"completed" in the middle of the running job could come from? There is
nothing in the mom log nor server log to indicate the job was reportd as
"completed" by torque.

torque job run:

server:
10/18/2013 17:25:31;0008;PBS_Server.33699;Job;883125.tscc-mgr.local;Job Run at request of maui at tscc-mgr.local
10/18/2013 17:49:52;0010;PBS_Server.33552;Job;883125.tscc-mgr.local;Exit_status=0 resources_used.cput=03:01:10 resources_used.mem=903736kb resources_used.vmem=4254968kb resources_used.walltime=00:24:21

mom:
10/18/2013 17:25:32;0001;   pbs_mom.9618;Job;TMomFinalizeJob3;job 883125.tscc-mgr.local started, pid = 27027
10/18/2013 17:49:51;0080;   pbs_mom.9618;Job;883125.tscc-mgr.local;scan_for_terminated: job 883125.tscc-mgr.local task 1 terminated, sid=27027


maui reports the job twice as "Completed" and charges for each

883125                 0   8   zix009 ong-group  864000 Completed [home-ong:1]
1382142329 1382142331 1382142331 1382142832
[NONE] [NONE] [NONE] >=    0M >=      0M [ong-node] 1382142329   8    8 [NONE]:hi [RESTARTABLE] ong-group          [NONE] [NONE]   0 28063.80   DEFAULT      1      0M      0M      0M         0 2140000000 tscc-1-56 TSCC [NONE] [NONE] [DEFAULT] [NONE] [NONE]

883125                 0   8   zix009 ong-group  864000 Completed [home-ong:1]
1382142329 1382142331 1382142331 1382143792
[NONE] [NONE] [NONE] >=    0M >=      0M [ong-node] 1382142329   8    8 [NONE]:hi [RESTARTABLE] ong-group          [NONE] [NONE]   0 59108.80   DEFAULT      1      0M      0M      0M         0 2140000000 tscc-1-56 TSCC [NONE] [NONE] [DEFAULT] [NONE] [NONE]


torque is version 4.2.5 and maui version 3.3.1.

Thanks for any idea how to track this problem.
Eva



More information about the torqueusers mailing list