[torqueusers] Torque Monthly Usage Accounting
etienne gondet
etienne.gondet at mercator-ocean.fr
Fri Jan 6 07:49:23 MST 2006
hello,
I just had a try to pbsacct. It's just the easy tools I was looking for.
I tried to add total cumulated cpu and I believe there is a mistake in
the cpu computation.
In pbsjobs : cput is computed according to the value of resources_used.cput
which is the total cpu of cput over all nodes and ppn ? Anybody can
confirm this point.
Wallclock Average Average
CPU
Username Group #jobs hours Percent #nodes q-days hours
-------- ----- ----- --------- ------- ------- ------- -----
TOTAL - 1876 8248.34 100.00 4.41 0.00 3017.38
user1 red 745 3538.88 42.90 6.00 0.00 1229.49
user2 red 285 2382.64 28.89 2.99 0.00 1103.90
But in pbsacct you remultiply by the number of nodes
line 108 cpunodes[user] += nodect*cput
line 116 cpunodesecs += nodect*cput
So I guess the following should have been more accurate.
line 108 cpunodes[user] += nodect*cput
line 116 cpunodesecs += nodect*cput
If I look an accounting record resources_used.cput=01:41:36 is > to
resources_used.walltime=00:51:22
That's why i thik it's already the cmulated VCPU over all the processors
nodes*ppn.
01/05/2006 02:18:34;E;20020.baltic;user=mbenkiran group=mercator
jobname=SAM1V2_UV queue=long ctime=1136424430 qtime=1136424431
etime=1136424431 start=1136424432
exec_host=baltic-05/1+baltic-05/0+baltic-04/1+baltic-04/0+baltic-03/1+baltic-03/0
Resource_List.cput=12:30:00 Resource_List.neednodes=3:ppn=2
Resource_List.nodect=3 Resource_List.nodes=3:ppn=2
Resource_List.pcput=03:00:00 Resource_List.pmem=5888mb
Resource_List.pvmem=5888mb Resource_List.walltime=03:00:00 session=0
end=1136427514 Exit_status=0 resources_used.cput=01:41:36
resources_used.mem=4095000kb resources_used.vmem=3003072kb
resources_used.walltime=00:51:22
Happy new years to all torque users.
Ole Holm Nielsen a écrit:
> hpc.group at gmail.com wrote:
>
>> Does anyone know how to generate an accurate torque monthly usage report
>> based on cpu number, not number of nodes for cluster and SMP machine?
>> The
>> report will include userid, group, wall-clock (hours), cpu time (hours)
>> and cpu number. Pls let me know, thanks.
>
>
> I wrote some really simple PBS accounting scripts for PBS (Torque and
> PBSPro)
> some years ago, and this is what we still use. You may download the
> pbsacct
> package from ftp://ftp.fysik.dtu.dk/pub/PBS/
>
> Regards,
> Ole
>
-------------- next part --------------
#!/bin/sh
# Summarize USER accounting information from PBS accounting files
# located in $PBSHOME/server_priv/accounting/
# The accompanying script "pbsjobs" extracts simplified records
# of completed jobs.
# Usage: pbsacct <accounting-files>
# where <accounting-files> are daily PBS records (such as 20000705)
# Author: Ole.H.Nielsen at fysik.dtu.dk
# Thanks to: Miroslaw.Prywata at fuw.edu.pl
#---------------------------------------------------------------
#BINDIR=/usr/local/bin
BINDIR=/home/mercator/64/bin
GROUPID=""
if [ -z "$1" ] ; then
echo "Usage: $0 [-g groupid] accounting-files";
exit 1
fi
#
case $1 in
-g) GROUPID=$2
shift; shift;
esac
# Accounting-files:
ACCT_FILES=$*
NUM_FILES=$#
# Sanity check
for f in ${ACCT_FILES}
do
if [ ! -r $f ]
then
echo ERROR: File $f is unreadable:
ls -la $f
exit 1
fi
done
# The pbsjobs accounting-information extractor script:
# May be set by an environment variable.
if [ -z "${PBSJOBS}" ] ; then
PBSJOBS="${BINDIR}/pbsjobs";
fi
if [ ! -x "${PBSJOBS}" ] ; then
echo No ${PBSJOBS} executable found
exit 1
fi
# A working file
JOBTEMP=/tmp/pbsjobs.$$
# Trap error signals:
trap "rm -f ${JOBTEMP}; exit 2" 1 2 3 14 15 19
#---------------------------------------------------------------
# List the input files
echo
echo "Portable Batch System USER accounting statistics"
echo "------------------------------------------------"
echo
echo A total of $NUM_FILES accounting files will be processed.
rm -f ${JOBTEMP}
cat ${ACCT_FILES} | ${PBSJOBS} > ${JOBTEMP}
cat ${JOBTEMP} | awk '
{
if (NR == 1) firstdate=$7
lastdate=$7
} END {
printf("The first record is dated %s, last record is dated %s.\n",
firstdate, lastdate)
}'
#---------------------------------------------------------------
echo
echo " Wallclock Average Average CPU"
echo "Username Group #jobs hours Percent #nodes q-days hours"
echo "-------- ----- ----- --------- ------- ------- ------- -----"
cat ${JOBTEMP} | awk -vGROUPID=$GROUPID '
{
# Parse input data
user = $2 # User name
group = $3 # Group name
queue = $4 # Queue name
nodect = $5 # Number of nodes used
cput = $6 # CPU time in seconds
wall = $9 # Wallclock time in seconds
wait = $11 # Waiting time in seconds
total_ncpus = $12 # Total number of CPUs used (>=nodect)
#
# For accounting by number of CPUs in stead of number of nodes:
# Uncomment the following line:
#ETG modif for SBU = walltime*NCPUS
# nodect = total_ncpus
nodect = total_ncpus
username[user] = user
groupname[user] = group
jobs[user]++
#ETG cpunodes[user] += nodect*cput
cpunodes[user] += cput
wallnodes[user] += nodect*wall
wallcpu[user] += wall
if (nodect < minnodes[user]) minnodes[user] = nodect
if (nodect > maxnodes[user]) maxnodes[user] = nodect
waittime[user] += wait
totaljobs++
totalwait += wait
#ETG cpunodesecs += nodect*cput
cpunodesecs += cput
wallnodesecs += nodect*wall
wallsecs += wall
} END {
cpunodedays = cpunodesecs / 86400
wallnodedays = wallnodesecs / 86400
walldays = wallsecs / 86400
groupjobs = 0
groupdays = 0
for (user in username) {
if (length(GROUPID) > 0 && groupname[user] != GROUPID) continue
if (wallcpu[user] > 0)
printf("%10s %8s %7d %8.2f %6.2f %7.2f %7.2f %8.2f\n",
username[user], groupname[user], jobs[user],
wallnodes[user]/3600, wallnodes[user]/(864*wallnodedays),
wallnodes[user]/wallcpu[user], waittime[user]/jobs[user]/36400,
cpunodes[user]/3600)
groupjobs += jobs[user]
groupnodedays += wallnodes[user]/86400
groupdays += wallcpu[user]/86400
groupwait += waittime[user]
}
printf("%10s %8s %7d %8.2f %6.2f %7.2f %7.2f %8.2f\n",
"TOTAL", "-", totaljobs, wallnodesecs/3600, 100,
wallnodedays/walldays, totalwait/totaljobs/86400, cpunodesecs/3600)
if (length(GROUPID) > 0 && groupjobs > 0)
printf("%10s %8s %7d %8.2f %7.2f %7.2f %7.2f \n",
"GROUP", GROUPID, groupjobs, groupnodedays,
100*groupnodedays/wallnodedays,
groupnodedays/groupdays, groupwait/groupjobs/86400)
} ' | sort -r -n +3
rm -f ${JOBTEMP}
exit 0
More information about the torqueusers
mailing list