[Mauiusers] Maui crashes with error
Phillip Spencer Davis
psdavis at bsu.edu
Thu Nov 19 06:33:57 MST 2009
Hello,
For the past few months the Maui service has been crashing with the
following error:
11/18 10:24:30 INFO: active PBS job 3538 has been removed from the
queue. assuming successful completion
11/18 10:24:30 MJobProcessCompleted(3538)
11/18 10:24:30 MAMAllocJDebit(A,3538,SC,ErrMsg)
I'm running Torque-1.3.6 resource manager, Maui 3.2.6p21 on a ROCKS 5.1
cluster.
I've attached the output from showconfig. Any thoughts as to why Maui is
crashing?
# Maui version 3.2.6p21 (PID: 7380)
# global policies
REJECTNEGPRIOJOBS[0] FALSE
ENABLENEGJOBPRIORITY[0] FALSE
ENABLEMULTINODEJOBS[0] TRUE
ENABLEMULTIREQJOBS[0] TRUE
BFPRIORITYPOLICY[0] [NONE]
JOBPRIOACCRUALPOLICY QUEUEPOLICY
NODELOADPOLICY ADJUSTSTATE
USEMACHINESPEED FALSE
USESYSTEMQUEUETIME TRUE
USELOCALMACHINEPRIORITY FALSE
NODEUNTRACKEDLOADFACTOR 1.2
JOBNODEMATCHPOLICY[0]
JOBMAXSTARTTIME[0] INFINITY
METAMAXTASKS[0] 0
NODESETPOLICY[0] [NONE]
NODESETATTRIBUTE[0] [NONE]
NODESETLIST[0]
NODESETDELAY[0] 00:00:00
NODESETPRIORITYTYPE[0] MINLOSS
NODESETTOLERANCE[0] 0.00
BACKFILLPOLICY[0] BESTFIT
BACKFILLDEPTH[0] 0
BACKFILLPROCFACTOR[0] 0
BACKFILLMAXSCHEDULES[0] 10000
BACKFILLMETRIC[0] PROCS
BFCHUNKDURATION[0] 00:00:00
BFCHUNKSIZE[0] 0
PREEMPTPOLICY[0] REQUEUE
MINADMINSTIME[0] 00:00:00
RESOURCELIMITPOLICY[0]
NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT]
NODEALLOCATIONPOLICY[0] FIRSTAVAILABLE
TASKDISTRIBUTIONPOLICY[0] DEFAULT
RESERVATIONPOLICY[0] NEVER
RESERVATIONRETRYTIME[0] 00:00:00
RESERVATIONTHRESHOLDTYPE[0] NONE
RESERVATIONTHRESHOLDVALUE[0] 0
FSPOLICY [NONE]
FSPOLICY [NONE]
FSINTERVAL 12:00:00
FSDEPTH 8
FSDECAY 1.00
# Priority Weights
SERVICEWEIGHT[0] 1
TARGETWEIGHT[0] 1
CREDWEIGHT[0] 1
ATTRWEIGHT[0] 1
FSWEIGHT[0] 1
RESWEIGHT[0] 1
USAGEWEIGHT[0] 1
QUEUETIMEWEIGHT[0] 1
XFACTORWEIGHT[0] 0
SPVIOLATIONWEIGHT[0] 0
BYPASSWEIGHT[0] 0
TARGETQUEUETIMEWEIGHT[0] 0
TARGETXFACTORWEIGHT[0] 0
USERWEIGHT[0] 0
GROUPWEIGHT[0] 0
ACCOUNTWEIGHT[0] 0
QOSWEIGHT[0] 0
CLASSWEIGHT[0] 0
FSUSERWEIGHT[0] 0
FSGROUPWEIGHT[0] 0
FSACCOUNTWEIGHT[0] 0
FSQOSWEIGHT[0] 0
FSCLASSWEIGHT[0] 0
ATTRATTRWEIGHT[0] 0
ATTRSTATEWEIGHT[0] 0
NODEWEIGHT[0] 0
PROCWEIGHT[0] 0
MEMWEIGHT[0] 0
SWAPWEIGHT[0] 0
DISKWEIGHT[0] 0
PSWEIGHT[0] 0
PEWEIGHT[0] 0
WALLTIMEWEIGHT[0] 0
UPROCWEIGHT[0] 0
UJOBWEIGHT[0] 0
CONSUMEDWEIGHT[0] 0
USAGEEXECUTIONTIMEWEIGHT[0] 0
REMAININGWEIGHT[0] 0
PERCENTWEIGHT[0] 0
XFMINWCLIMIT[0] 00:02:00
# partition DEFAULT policies
REJECTNEGPRIOJOBS[1] FALSE
ENABLENEGJOBPRIORITY[1] FALSE
ENABLEMULTINODEJOBS[1] TRUE
ENABLEMULTIREQJOBS[1] FALSE
BFPRIORITYPOLICY[1] [NONE]
JOBPRIOACCRUALPOLICY QUEUEPOLICY
NODELOADPOLICY ADJUSTSTATE
JOBNODEMATCHPOLICY[1]
JOBMAXSTARTTIME[1] INFINITY
METAMAXTASKS[1] 0
NODESETPOLICY[1] [NONE]
NODESETATTRIBUTE[1] [NONE]
NODESETLIST[1]
NODESETDELAY[1] 00:00:00
NODESETPRIORITYTYPE[1] MINLOSS
NODESETTOLERANCE[1] 0.00
# Priority Weights
XFMINWCLIMIT[1] 00:00:00
RMAUTHTYPE[0] CHECKSUM
CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE]
CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE]
CLASSCFG[short] DEFAULT.FEATURES=[NONE]
CLASSCFG[gaussian] DEFAULT.FEATURES=[NONE]
CLASSCFG[bsu-research] DEFAULT.FEATURES=[NONE]
CLASSCFG[batch] DEFAULT.FEATURES=[NONE]
CLASSCFG[gaussbatch] DEFAULT.FEATURES=[NONE]
CLASSCFG[atk] DEFAULT.FEATURES=[NONE]
CLASSCFG[long] DEFAULT.FEATURES=[NONE]
CLASSCFG[routing] DEFAULT.FEATURES=[NONE]
CLASSCFG[matlab] DEFAULT.FEATURES=[NONE]
CLASSCFG[sequence] DEFAULT.FEATURES=[NONE]
QOSPRIORITY[0] 0
QOSQTWEIGHT[0] 0
QOSXFWEIGHT[0] 0
QOSTARGETXF[0] 0.00
QOSTARGETQT[0] 00:00:00
QOSFLAGS[0]
QOSPRIORITY[1] 0
QOSQTWEIGHT[1] 0
QOSXFWEIGHT[1] 0
QOSTARGETXF[1] 0.00
QOSTARGETQT[1] 00:00:00
QOSFLAGS[1]
QOSPRIORITY[2] 100
QOSQTWEIGHT[2] 0
QOSXFWEIGHT[2] 0
QOSTARGETXF[2] 0.00
QOSTARGETQT[2] 00:00:00
QOSFLAGS[2] PREEMPTOR
QOSPRIORITY[3] 500
QOSQTWEIGHT[3] 0
QOSXFWEIGHT[3] 0
QOSTARGETXF[3] 0.00
QOSTARGETQT[3] 00:00:00
QOSFLAGS[3] PREEMPTOR
QOSPRIORITY[4] -1000
QOSQTWEIGHT[4] 0
QOSXFWEIGHT[4] 0
QOSTARGETXF[4] 0.00
QOSTARGETQT[4] 00:00:00
QOSFLAGS[4] PREEMPTEE
# SERVER MODULES: MX
SERVERMODE NORMAL
SERVERNAME
SERVERHOST ccncluster.bsu.edu
SERVERPORT 42559
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGFILEROLLDEPTH 1
LOGLEVEL 3
LOGFACILITY fALL
SERVERHOMEDIR /usr/local/maui/
TOOLSDIR /usr/local/maui/tools/
LOGDIR /usr/local/maui/log/
STATDIR /usr/local/maui/stats/
LOCKFILE /usr/local/maui/maui.pid
SERVERCONFIGFILE /usr/local/maui/maui.cfg
CHECKPOINTFILE /usr/local/maui/maui.ck
CHECKPOINTINTERVAL 00:05:00
CHECKPOINTEXPIRATIONTIME 3:11:20:00
TRAPJOB
TRAPNODE
TRAPFUNCTION
RESDEPTH 24
RMPOLLINTERVAL 00:00:30
NODEACCESSPOLICY SHARED
ALLOCLOCALITYPOLICY [NONE]
SIMTIMEPOLICY [NONE]
ADMIN1 mauiuser root psdavis
ADMINHOSTS ALL
NODEPOLLFREQUENCY 0
DISPLAYFLAGS
DEFAULTDOMAIN
DEFAULTCLASSLIST [DEFAULT:1]
FEATURENODETYPEHEADER
FEATUREPROCSPEEDHEADER
FEATUREPARTITIONHEADER
DEFERTIME 1:00:00
DEFERCOUNT 24
DEFERSTARTCOUNT 1
JOBPURGETIME 0
NODEPURGETIME 2140000000
APIFAILURETHRESHHOLD 6
NODESYNCTIME 600
JOBSYNCTIME 600
JOBMAXOVERRUN 00:10:00
NODEMAXLOAD 0.0
PLOTMINTIME 120
PLOTMAXTIME 245760
PLOTTIMESCALE 11
PLOTMINPROC 1
PLOTMAXPROC 512
PLOTPROCSCALE 9
SCHEDCFG[] MODE=NORMAL
SERVER=ccncluster.bsu.edu:42559
# RM MODULES: PBS SSS WIKI NATIVE
RMCFG[base] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09 TYPE=PBS
SIMWORKLOADTRACEFILE workload
SIMRESOURCETRACEFILE resource
SIMAUTOSHUTDOWN OFF
SIMSTARTTIME 0
SIMSCALEJOBRUNTIME FALSE
SIMFLAGS
SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH
SIMINITIALQUEUEDEPTH 16
SIMWCACCURACY 0.00
SIMWCACCURACYCHANGE 0.00
SIMNODECOUNT 0
SIMNODECONFIGURATION NORMAL
SIMWCSCALINGPERCENT 100
SIMCOMRATE 0.10
SIMCOMTYPE ROUNDROBIN
COMINTRAFRAMECOST 0.30
COMINTERFRAMECOST 0.30
SIMSTOPITERATION -1
SIMEXITITERATION -1
More information about the mauiusers
mailing list