[torqueusers] Torque Monthly Usage Accounting

etienne gondet etienne.gondet at mercator-ocean.fr
Fri Jan 6 07:49:23 MST 2006


    hello,

I just had a try to pbsacct. It's just the easy tools I was looking for.

I tried to add total cumulated cpu  and I believe there is a mistake in 
the cpu computation.
 
In pbsjobs : cput is computed according to the value of resources_used.cput
which is the total cpu of cput over all nodes and ppn ? Anybody can 
confirm this point.

                          Wallclock          Average         Average    
              CPU
Username    Group   #jobs     hours  Percent  #nodes  q-days  hours
--------    -----   ----- ---------  ------- ------- -------  -----
     TOTAL        -    1876   8248.34  100.00    4.41    0.00  3017.38
     user1         red     745   3538.88    42.90    6.00    0.00  1229.49
    user2          red     285   2382.64    28.89    2.99    0.00  1103.90

But in pbsacct you remultiply by the number of nodes
line  108         cpunodes[user] += nodect*cput
line 116         cpunodesecs += nodect*cput

So I guess the following should have been more accurate.
line  108         cpunodes[user] += nodect*cput
line 116         cpunodesecs += nodect*cput


If I look an accounting record resources_used.cput=01:41:36 is > to 
resources_used.walltime=00:51:22
That's why i thik it's already the cmulated VCPU over all the processors 
nodes*ppn.

01/05/2006 02:18:34;E;20020.baltic;user=mbenkiran group=mercator 
jobname=SAM1V2_UV queue=long ctime=1136424430 qtime=1136424431 
etime=1136424431 start=1136424432 
exec_host=baltic-05/1+baltic-05/0+baltic-04/1+baltic-04/0+baltic-03/1+baltic-03/0 
Resource_List.cput=12:30:00 Resource_List.neednodes=3:ppn=2 
Resource_List.nodect=3 Resource_List.nodes=3:ppn=2 
Resource_List.pcput=03:00:00 Resource_List.pmem=5888mb 
Resource_List.pvmem=5888mb Resource_List.walltime=03:00:00 session=0 
end=1136427514 Exit_status=0 resources_used.cput=01:41:36 
resources_used.mem=4095000kb resources_used.vmem=3003072kb 
resources_used.walltime=00:51:22

    Happy new years to all torque users.

Ole Holm Nielsen a écrit:

> hpc.group at gmail.com wrote:
>
>> Does anyone know how to generate an accurate torque monthly usage report
>> based on cpu number, not number of nodes for cluster and SMP machine? 
>> The
>> report will include userid, group, wall-clock (hours), cpu time (hours)
>> and cpu number. Pls let me know, thanks.
>
>
> I wrote some really simple PBS accounting scripts for PBS (Torque and 
> PBSPro)
> some years ago, and this is what we still use.  You may download the 
> pbsacct
> package from ftp://ftp.fysik.dtu.dk/pub/PBS/
>
> Regards,
> Ole
>

-------------- next part --------------
#!/bin/sh

# Summarize USER accounting information from PBS accounting files
# located in $PBSHOME/server_priv/accounting/

# The accompanying script "pbsjobs" extracts simplified records
# of completed jobs.

# Usage: pbsacct <accounting-files>
# where <accounting-files> are daily PBS records (such as 20000705)
# Author:	Ole.H.Nielsen at fysik.dtu.dk
# Thanks to:	Miroslaw.Prywata at fuw.edu.pl

#---------------------------------------------------------------

#BINDIR=/usr/local/bin
BINDIR=/home/mercator/64/bin
GROUPID=""

if [ -z "$1" ] ; then
	echo "Usage: $0 [-g groupid] accounting-files";
	exit 1
fi

# 
case $1 in
	-g) GROUPID=$2
	    shift; shift;
esac

# Accounting-files:
ACCT_FILES=$*
NUM_FILES=$#
# Sanity check
for f in ${ACCT_FILES}
do
	if [ ! -r $f ]
	then
		echo ERROR: File $f is unreadable:
		ls -la $f
		exit 1
	fi
done

# The pbsjobs accounting-information extractor script:
# May be set by an environment variable.
if [ -z "${PBSJOBS}" ] ; then
	PBSJOBS="${BINDIR}/pbsjobs";
fi
if [ ! -x "${PBSJOBS}" ] ; then
	echo No ${PBSJOBS} executable found
	exit 1
fi

# A working file
JOBTEMP=/tmp/pbsjobs.$$
# Trap error signals:
trap "rm -f ${JOBTEMP}; exit 2" 1 2 3 14 15 19

#---------------------------------------------------------------

# List the input files 
echo
echo "Portable Batch System USER accounting statistics"
echo "------------------------------------------------"
echo
echo A total of $NUM_FILES accounting files will be processed.

rm -f ${JOBTEMP}
cat ${ACCT_FILES} | ${PBSJOBS} > ${JOBTEMP}

cat ${JOBTEMP} | awk '
{
	if (NR == 1) firstdate=$7
	lastdate=$7
} END {
	printf("The first record is dated %s, last record is dated %s.\n",
		firstdate, lastdate)
}'

#---------------------------------------------------------------

echo
echo "                          Wallclock          Average Average  CPU"
echo "Username    Group   #jobs     hours  Percent  #nodes  q-days  hours"
echo "--------    -----   ----- ---------  ------- ------- -------  -----"

cat ${JOBTEMP} | awk -vGROUPID=$GROUPID '
{
	# Parse input data
	user	= $2		# User name
	group	= $3		# Group name
	queue	= $4		# Queue name
	nodect	= $5		# Number of nodes used
	cput	= $6		# CPU time in seconds
	wall	= $9		# Wallclock time in seconds
	wait	= $11		# Waiting time in seconds
	total_ncpus = $12	# Total number of CPUs used (>=nodect)

	#
	# For accounting by number of CPUs in stead of number of nodes:
	# Uncomment the following line:
#ETG modif for SBU = walltime*NCPUS
	# nodect = total_ncpus
	nodect = total_ncpus

	username[user] = user
	groupname[user] = group
	jobs[user]++
#ETG	cpunodes[user] += nodect*cput
	cpunodes[user] += cput
	wallnodes[user] += nodect*wall
	wallcpu[user] += wall
	if (nodect < minnodes[user]) minnodes[user] = nodect
	if (nodect > maxnodes[user]) maxnodes[user] = nodect
	waittime[user] += wait
	totaljobs++
	totalwait += wait
#ETG	cpunodesecs += nodect*cput
	cpunodesecs += cput
	wallnodesecs += nodect*wall
	wallsecs += wall
} END {
	cpunodedays = cpunodesecs / 86400
	wallnodedays = wallnodesecs / 86400
	walldays = wallsecs / 86400
	groupjobs = 0
	groupdays = 0
	for (user in username) {
		if (length(GROUPID) > 0 && groupname[user] != GROUPID) continue
		if (wallcpu[user] > 0)
			printf("%10s %8s %7d  %8.2f  %6.2f %7.2f %7.2f %8.2f\n",
			username[user], groupname[user], jobs[user], 
			wallnodes[user]/3600, wallnodes[user]/(864*wallnodedays),
			wallnodes[user]/wallcpu[user], waittime[user]/jobs[user]/36400,
                        cpunodes[user]/3600)
		groupjobs += jobs[user]
		groupnodedays += wallnodes[user]/86400
		groupdays += wallcpu[user]/86400
		groupwait += waittime[user]
	}
	printf("%10s %8s %7d  %8.2f  %6.2f %7.2f %7.2f %8.2f\n",
		"TOTAL", "-", totaljobs, wallnodesecs/3600, 100,
		wallnodedays/walldays, totalwait/totaljobs/86400, cpunodesecs/3600)
	if (length(GROUPID) > 0 && groupjobs > 0)
		printf("%10s %8s %7d  %8.2f  %7.2f %7.2f %7.2f \n",
			"GROUP", GROUPID, groupjobs, groupnodedays,
			100*groupnodedays/wallnodedays,
			groupnodedays/groupdays, groupwait/groupjobs/86400)
		
} ' | sort -r -n +3 

rm -f ${JOBTEMP}
exit 0


More information about the torqueusers mailing list