[torqueusers] Missing resource_used info in /var/spool/torque/server_priv/accounting

Jean-Christophe Ducom jcducom at gmail.com
Fri Jan 13 16:03:23 MST 2012


Hi-
Our HPC cluster is running Torque+Maui.
The RPMs available for Opensuse11.3 have been used:
----------------------------------
# rpm -qa | grep torque
torque-server-2.5.9-1.1.x86_64
torque-client-2.5.9-1.1.x86_64
torque-2.5.9-1.1.x86_64
torque-pam-2.5.9-1.1.x86_64
torque-gui-2.5.9-1.1.x86_64
libtorque2-2.5.9-1.1.x86_64
torque-mom-2.5.9-1.1.x86_64
torque-devel-2.5.9-1.1.x86_64
torque-scheduler-2.5.9-1.1.x86_64

# rpm -qa | grep maui
maui-client-3.3-21.3.x86_64
maui-3.3-21.3.x86_64
maui-devel-3.3-21.3.x86_64
-----------------------------------



While qstat -f job_id returns correctly an info regarding job e.g.
----------------------------------
# qstat -f 2193248
Job Id: 2193248.garibaldi01-adm.cluster.net
      Job_Name = test
      Job_Owner = xxx
      resources_used.cput = 05:15:42
      resources_used.mem = 62968kb
      resources_used.vmem = 266224kb
      resources_used.walltime = 04:08:52
      job_state = R
      queue = workq
      server = garibaldi01-adm.cluster.net
      Checkpoint = u
      ctime = Fri Jan 13 10:35:38 2012
[...]
-----------------------------------




The /var/spool/torque/server_priv/accounting files are missing all the
resource_used.{cput,mem,vmem,walltime} fields:
-----------------------------------
#tail /var/spool/torque/server_priv/accounting/20120113
[...]
1/13/2012 14:52:36;S;2292144.garibaldi01-adm.cluster.net;user=sgadvise
group=its jobname=SET_169644.job queue=workq ctime=1326494912
qtime=1326494912 etime=1326494912 start=1326495156
owner=sgadvise at node0675.cluster.net  exec_host=node0970/6
Resource_List.cput=120:00:00 Resource_List.mem=8gb Resource_List.ncpus=1
Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1
Resource_List.walltime=120:00:00
01/13/2012 14:52:36;Q;2294097.garibaldi01-adm.cluster.net;queue=workq
01/13/2012 14:52:36;E;2292033.garibaldi01-adm.cluster.net;user=sgadvise
group=its jobname=SET_169612.job queue=workq ctime=1326494899
qtime=1326494899 etime=1326494899 start=1326495134
owner=sgadvise at node0675.cluster.net  exec_host=node0670/1
Resource_List.cput=120:00:00 Resource_List.mem=8gb Resource_List.ncpus=1
Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1
Resource_List.walltime=120:00:00 session=5214 end=1326495156 Exit_status=0
01/13/2012 14:52:36;Q;2294098.garibaldi01-adm.cluster.net;queue=workq
01/13/2012 14:52:36;E;2292123.garibaldi01-adm.cluster.net;user=sgadvise
group=its jobname=SET_169638.job queue=workq ctime=1326494910
qtime=1326494910 etime=1326494910 start=1326495153
owner=sgadvise at node0675.cluster.net  exec_host=node0948/2
Resource_List.cput=120:00:00 Resource_List.mem=8gb Resource_List.ncpus=1
Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1
Resource_List.walltime=120:00:00 session=6168 end=1326495156 Exit_status=0
[...]
-----------------------------------------


I suspect that something in our configuration files is missing but I can
not pinpoint it.
Thank you in advance for any help
Best,
JC

-----------
Jean-Christophe Ducom, PhD
The Scripps Research Institute
10550 N. Torrey Pines Rd
La Jolla, CA 92037







Configuration files (maui.cfg, mom config, pbs server)
-------------------------------
# cat /var/spool/maui/maui.cfg
# maui.cfg 3.3.1

SERVERHOST            garibaldi01.scripps.edu
# primary admin must be first in list
ADMIN1                root

# Resource Manager Definition
RMTYPE[0] PBS

# Allocation Manager Definition
AMCFG[bank]  TYPE=NONE

## default is 10
#AMCFG[bank] TIMEOUT=30

# full parameter docs athttp://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL        00:00:10

## specifies the number of scheduling iterations between scheduler
initiated node manager queries
NODEPOLLFREQUENCY 8

SERVERPORT            40559
SERVERMODE            NORMAL


USERCFG[DEFAULT] MAXJOB=800
USERCFG[DEFAULT] MAXPROC=800
USERCFG[DEFAULT] MAXNODE=400
USERCFG[DEFAULT] MAXMEM=3000

## specifies number of time a job will be allowed to fail in its
## start attempts before being deferred.
DEFERSTARTCOUNT 2

## specifies whether or not the scheduler will allow jobs to span more
than one node
ENABLEMULTINODEJOBS TRUE

## specifies whether or not the scheduler will allow jobs to specify
## multiple independent resource requests
## (i.e., pbs jobs with resource specifications such as '-l
nodes=3:fast+1:io')
ENABLEMULTIREQJOBS TRUE

## amount of time Maui will allow a job to exceed its wallclock limit
## before it is terminated
JOBMAXOVERRUN 2:00:00

# Admin:http://supercluster.org/mauidocs/a.esecurity.html

LOGFILE               /var/spool/maui/maui.log
LOGFILEMAXSIZE        100000000
LOGLEVEL              1
LOGFILEROLLDEPTH      10

# Job Priority:http://supercluster.org/mauidocs/5.1jobprioritization.html

QUEUETIMEWEIGHT       1

# FairShare:http://supercluster.org/mauidocs/6.3fairshare.html

#FSPOLICY              PSDEDICATED
#FSDEPTH               7
#FSINTERVAL            86400
#FSDECAY               0.80

# Throttling Policies:
http://supercluster.org/mauidocs/6.2throttlingpolicies.html

# NONE SPECIFIED

# Backfill:http://supercluster.org/mauidocs/8.2backfill.html

BACKFILLPOLICY        BESTFIT
# Dec 13, 2011
BACKFILLMETRIC PROCS

#RESERVATIONPOLICY     CURRENTHIGHEST
#RESERVATIONDEPTH       2
RESERVATIONPOLICY     NEVER

# Node Allocation:http://supercluster.org/mauidocs/5.2nodeallocation.html
# NODEALLOCATIONPOLICY  MINRESOURCE

#NODEALLOCATIONPOLICY PRIORITY
#NODECFG[DEFAULT] PRIORITYF='CPROCS + AMEM - 10 * JOBCOUNT'

NODEALLOCATIONPOLICY  MINRESOURCE
#NODEAVAILABILITYPOLICY DEDICATED:PROCS COMBINED:MEM

# QOS:http://supercluster.org/mauidocs/7.3qos.html

# QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE

# Standing Reservations:
http://supercluster.org/mauidocs/7.1.3standingreservations.html

# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test]   17:00:00
# SRDAYS[test]      MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test]   0:30:00

# Creds:http://supercluster.org/mauidocs/6.1fairnessoverview.html

# USERCFG[DEFAULT]      FSTARGET=25.0
# USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
# GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch]       FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR



**********************************************
# cat /var/spool/torque/mom_priv/config
$pbsserver garibaldi01-adm
$log_keep_days 30
$prologalarm 90
$igncput true
$rcpcmd /usr/bin/rcp
$spool_as_final_name true

**********************************************
#qmgr ' p s'
Max open servers: 10239
Qmgr: p s
#
# Create queues and set their attributes.
#
#
# Create and define queue workq
#
create queue workq
set queue workq queue_type = Execution
set queue workq resources_max.cput = 200000:00:00
set queue workq resources_max.mem = 3000gb
set queue workq resources_max.ncpus = 800
set queue workq resources_max.nodect = 200
set queue workq resources_max.nodes = 90:ppn=8
set queue workq resources_max.walltime = 400:00:00
set queue workq resources_default.cput = 01:00:00
set queue workq resources_default.mem = 2gb
set queue workq resources_default.ncpus = 1
set queue workq resources_default.nodect = 1
set queue workq resources_default.nodes = 1
set queue workq resources_default.walltime = 01:00:00
set queue workq max_user_run = 800
set queue workq keep_completed = 0
set queue workq enabled = True
set queue workq started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = garibaldi01-adm
set server default_queue = workq
set server log_events = 511
set server mail_from = hpc_ca
set server query_other_jobs = True
set server scheduler_iteration = 300
set server node_check_rate = 150
set server tcp_timeout = 6
set server mail_domain = scripps.edu
set server allow_node_submit = True
set server auto_node_np = True
set server next_job_number = 2296229
set server record_job_info = False
set server job_log_keep_days = 30





More information about the torqueusers mailing list