[torqueusers] Torque Monthly Usage Accounting

etienne gondet etienne.gondet at mercator-ocean.fr
Fri Jan 6 08:09:10 MST 2006


Yes should be usefull to point out problematic users

In the accounting record : Exit_status if differrent of 0 must indicate 
something wrong
Exit status consideration is not in pbsjobs and pbsacct but should be added.
with 2 columns  : number of jobs and number of jobs with non 0 exit_status

    Etienne Gondet

PS : I will try to add that.


etienne gondet a écrit:

>
>    hello,
>
> I just had a try to pbsacct. It's just the easy tools I was looking for.
>
> I tried to add total cumulated cpu  and I believe there is a mistake 
> in the cpu computation.
>
> In pbsjobs : cput is computed according to the value of 
> resources_used.cput
> which is the total cpu of cput over all nodes and ppn ? Anybody can 
> confirm this point.
>
>                          Wallclock          Average         Average    
>              CPU
> Username    Group   #jobs     hours  Percent  #nodes  q-days  hours
> --------    -----   ----- ---------  ------- ------- -------  -----
>     TOTAL        -    1876   8248.34  100.00    4.41    0.00  3017.38
>     user1         red     745   3538.88    42.90    6.00    0.00  1229.49
>    user2          red     285   2382.64    28.89    2.99    0.00  1103.90
>
> But in pbsacct you remultiply by the number of nodes
> line  108         cpunodes[user] += nodect*cput
> line 116         cpunodesecs += nodect*cput
>
> So I guess the following should have been more accurate.
> line  108         cpunodes[user] += nodect*cput
> line 116         cpunodesecs += nodect*cput
>
>
> If I look an accounting record resources_used.cput=01:41:36 is > to 
> resources_used.walltime=00:51:22
> That's why i thik it's already the cmulated VCPU over all the 
> processors nodes*ppn.
>
> 01/05/2006 02:18:34;E;20020.baltic;user=mbenkiran group=mercator 
> jobname=SAM1V2_UV queue=long ctime=1136424430 qtime=1136424431 
> etime=1136424431 start=1136424432 
> exec_host=baltic-05/1+baltic-05/0+baltic-04/1+baltic-04/0+baltic-03/1+baltic-03/0 
> Resource_List.cput=12:30:00 Resource_List.neednodes=3:ppn=2 
> Resource_List.nodect=3 Resource_List.nodes=3:ppn=2 
> Resource_List.pcput=03:00:00 Resource_List.pmem=5888mb 
> Resource_List.pvmem=5888mb Resource_List.walltime=03:00:00 session=0 
> end=1136427514 Exit_status=0 resources_used.cput=01:41:36 
> resources_used.mem=4095000kb resources_used.vmem=3003072kb 
> resources_used.walltime=00:51:22
>
>    Happy new years to all torque users.
>
> Ole Holm Nielsen a écrit:
>
>> hpc.group at gmail.com wrote:
>>
>>> Does anyone know how to generate an accurate torque monthly usage 
>>> report
>>> based on cpu number, not number of nodes for cluster and SMP 
>>> machine? The
>>> report will include userid, group, wall-clock (hours), cpu time (hours)
>>> and cpu number. Pls let me know, thanks.
>>
>>
>>
>> I wrote some really simple PBS accounting scripts for PBS (Torque and 
>> PBSPro)
>> some years ago, and this is what we still use.  You may download the 
>> pbsacct
>> package from ftp://ftp.fysik.dtu.dk/pub/PBS/
>>
>> Regards,
>> Ole
>>
>
>------------------------------------------------------------------------
>
>#!/bin/sh
>
># Summarize USER accounting information from PBS accounting files
># located in $PBSHOME/server_priv/accounting/
>
># The accompanying script "pbsjobs" extracts simplified records
># of completed jobs.
>
># Usage: pbsacct <accounting-files>
># where <accounting-files> are daily PBS records (such as 20000705)
># Author:	Ole.H.Nielsen at fysik.dtu.dk
># Thanks to:	Miroslaw.Prywata at fuw.edu.pl
>
>#---------------------------------------------------------------
>
>#BINDIR=/usr/local/bin
>BINDIR=/home/mercator/64/bin
>GROUPID=""
>
>if [ -z "$1" ] ; then
>	echo "Usage: $0 [-g groupid] accounting-files";
>	exit 1
>fi
>
># 
>case $1 in
>	-g) GROUPID=$2
>	    shift; shift;
>esac
>
># Accounting-files:
>ACCT_FILES=$*
>NUM_FILES=$#
># Sanity check
>for f in ${ACCT_FILES}
>do
>	if [ ! -r $f ]
>	then
>		echo ERROR: File $f is unreadable:
>		ls -la $f
>		exit 1
>	fi
>done
>
># The pbsjobs accounting-information extractor script:
># May be set by an environment variable.
>if [ -z "${PBSJOBS}" ] ; then
>	PBSJOBS="${BINDIR}/pbsjobs";
>fi
>if [ ! -x "${PBSJOBS}" ] ; then
>	echo No ${PBSJOBS} executable found
>	exit 1
>fi
>
># A working file
>JOBTEMP=/tmp/pbsjobs.$$
># Trap error signals:
>trap "rm -f ${JOBTEMP}; exit 2" 1 2 3 14 15 19
>
>#---------------------------------------------------------------
>
># List the input files 
>echo
>echo "Portable Batch System USER accounting statistics"
>echo "------------------------------------------------"
>echo
>echo A total of $NUM_FILES accounting files will be processed.
>
>rm -f ${JOBTEMP}
>cat ${ACCT_FILES} | ${PBSJOBS} > ${JOBTEMP}
>
>cat ${JOBTEMP} | awk '
>{
>	if (NR == 1) firstdate=$7
>	lastdate=$7
>} END {
>	printf("The first record is dated %s, last record is dated %s.\n",
>		firstdate, lastdate)
>}'
>
>#---------------------------------------------------------------
>
>echo
>echo "                          Wallclock          Average Average  CPU"
>echo "Username    Group   #jobs     hours  Percent  #nodes  q-days  hours"
>echo "--------    -----   ----- ---------  ------- ------- -------  -----"
>
>cat ${JOBTEMP} | awk -vGROUPID=$GROUPID '
>{
>	# Parse input data
>	user	= $2		# User name
>	group	= $3		# Group name
>	queue	= $4		# Queue name
>	nodect	= $5		# Number of nodes used
>	cput	= $6		# CPU time in seconds
>	wall	= $9		# Wallclock time in seconds
>	wait	= $11		# Waiting time in seconds
>	total_ncpus = $12	# Total number of CPUs used (>=nodect)
>
>	#
>	# For accounting by number of CPUs in stead of number of nodes:
>	# Uncomment the following line:
>#ETG modif for SBU = walltime*NCPUS
>	# nodect = total_ncpus
>	nodect = total_ncpus
>
>	username[user] = user
>	groupname[user] = group
>	jobs[user]++
>#ETG	cpunodes[user] += nodect*cput
>	cpunodes[user] += cput
>	wallnodes[user] += nodect*wall
>	wallcpu[user] += wall
>	if (nodect < minnodes[user]) minnodes[user] = nodect
>	if (nodect > maxnodes[user]) maxnodes[user] = nodect
>	waittime[user] += wait
>	totaljobs++
>	totalwait += wait
>#ETG	cpunodesecs += nodect*cput
>	cpunodesecs += cput
>	wallnodesecs += nodect*wall
>	wallsecs += wall
>} END {
>	cpunodedays = cpunodesecs / 86400
>	wallnodedays = wallnodesecs / 86400
>	walldays = wallsecs / 86400
>	groupjobs = 0
>	groupdays = 0
>	for (user in username) {
>		if (length(GROUPID) > 0 && groupname[user] != GROUPID) continue
>		if (wallcpu[user] > 0)
>			printf("%10s %8s %7d  %8.2f  %6.2f %7.2f %7.2f %8.2f\n",
>			username[user], groupname[user], jobs[user], 
>			wallnodes[user]/3600, wallnodes[user]/(864*wallnodedays),
>			wallnodes[user]/wallcpu[user], waittime[user]/jobs[user]/36400,
>                        cpunodes[user]/3600)
>		groupjobs += jobs[user]
>		groupnodedays += wallnodes[user]/86400
>		groupdays += wallcpu[user]/86400
>		groupwait += waittime[user]
>	}
>	printf("%10s %8s %7d  %8.2f  %6.2f %7.2f %7.2f %8.2f\n",
>		"TOTAL", "-", totaljobs, wallnodesecs/3600, 100,
>		wallnodedays/walldays, totalwait/totaljobs/86400, cpunodesecs/3600)
>	if (length(GROUPID) > 0 && groupjobs > 0)
>		printf("%10s %8s %7d  %8.2f  %7.2f %7.2f %7.2f \n",
>			"GROUP", GROUPID, groupjobs, groupnodedays,
>			100*groupnodedays/wallnodedays,
>			groupnodedays/groupdays, groupwait/groupjobs/86400)
>		
>} ' | sort -r -n +3 
>
>rm -f ${JOBTEMP}
>exit 0
>  
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060106/b58b16c0/attachment-0001.html


More information about the torqueusers mailing list