[torqueusers] Missing Torque job accounting information

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Wed Aug 29 03:28:27 MDT 2012


Hi Shade (Cc: torqueusers list)

I looked at the Torque accounting file you sent to me, and strangely 
your file doesn't contain any "resources_used" data whatsoever. Because 
missing data is equivalent to zero usage, the pbsacct scripts reports 
that there are NO accounting records from your cluster!

In your accounting file please consider this example line (jobs that 
Ended will have an ;E; in field no. 2):

08/26/2012 06:11:00;E;2043.mgt.cluster.net;user=masquelet group=users 
jobname=DAVIS queue=batch ctime=1345845922 qtime=1345845922 
etime=1345845922 start=1345889422 owner=masquelet at node256 
exec_host=(list truncated/OHN) Resource_List.neednodes=60:ppn=4 
Resource_List.nodect=60 Resource_List.nodes=60:ppn=4 
Resource_List.walltime=24:00:00 session=8892 end=1345975860 Exit_status=-10

In a correct example accounting file entry from our cluster you'll see 
additional usage information at the end of the line:

(data omitted) Exit_status=0 resources_used.cput=143:56:09 
resources_used.mem=40639016kb resources_used.vmem=56211260kb 
resources_used.walltime=03:36:06

Questions to the Torqueusers list
---------------------------------

Under what circumstances may jobs that finish have an "E" entry in the 
accounting file which is missing all the resources_used data?

Is this due to a sysadmin misconfiguration?  How can this error be 
corrected?

Shade: Will you kindly reply to the list with information about your 
version of Torque (do a "qstat --version").

Regards,
Ole

On 08/28/2012 09:59 PM, Shade Alabsa wrote:
>      I recently came across your script for torque accounting. I have
> tried to use it with the current version of Torque and it complains that
> there are no accounting records in the input files and I was wondering
> if you could help me in figuring out what I'm doing wrong. Below is a
> copy of what I'm doing,
>
> [root at mgt accounting]# pwd
> /var/spool/torque/server_priv/accounting
> [root at mgt accounting]# pbsacct 2012****

Shade: The "*" means 1 or more letters, so **** is bad practice. The 
correct usage would be "pbsacct 2012????" where "?" indicated exactly 1 
character.

You could also do "pbsacct 201207??" to select just the files from the 
month of July.  The pbsacct package includes a script pbsreportmonth 
which generates monthly accounting statistics. This is quite useful.

>
> Portable Batch System accounting statistics
> -------------------------------------------
>
> Processing a total of 69 accounting files... done.
> /usr/local/bin/pbsacct ERROR: There are no accounting records in the
> input files:
> 20120621 20120622 20120623 20120624 20120625 20120626 20120627 20120628
> 20120629 20120630 20120701 20120702 20120703 20120704 20120705 20120706
> 20120707 20120708 20120709 20120710 20120711 20120712 20120713 20120714
> 20120715 20120716 20120717 20120718 20120719 20120720 20120721 20120722
> 20120723 20120724 20120725 20120726 20120727 20120728 20120729 20120730
> 20120731 20120801 20120802 20120803 20120804 20120805 20120806 20120807
> 20120808 20120809 20120810 20120811 20120812 20120813 20120814 20120815
> 20120816 20120817 20120818 20120819 20120820 20120821 20120822 20120823
> 20120824 20120825 20120826 20120827 20120828
>
> The four stars are because we installed torque in June and we need the
> accounting for since then and the logs only include dates since since.
> Also attached is a accounting file if that helps at all. Thank you!


-- 
Ole Holm Nielsen
Department of Physics, Technical University of Denmark


More information about the torqueusers mailing list