[torqueusers] Multi core jobs are not scheduled by Maui

James A. Peltier jpeltier at sfu.ca
Thu Oct 24 12:12:28 MDT 2013


ENABLEMULTIREQJOBS = FALSE should be TRUE 

----- Original Message -----

| Hello

| We are having a small 60 node cluster with 12 cores each.

| We are using Torque 2.5.12 and Maui 3.3

| When i submit jobs on torque using multi-core using #PBS directive ,
| the jobs get queued in the queue. I can run the job with qsub in
| Interactive mode.

| #PBS -l "nodes=1:ppn=4"

| The error i get in maui log file as follows :

| 10/24 20:16:10 ALERT: job 427 cannot run in any partition
| 10/24 20:16:10 ALERT: cannot create new reservation for job 427
| (shape[1] 4)
| 10/24 20:16:10 ALERT: cannot create new reservation for job 427
| 10/24 20:16:10 ALERT: job '427' cannot run (deferring job for 3600
| seconds)

| -----------------------------------------------------------------------------------------------------
| The showref shows the following

| Reservations

| ReservationID Type S Start End Duration N/P StartTime

| 0 reservations located

| ---------------------------------------------------------------------------------------------------------------------

| Showbf gives the following error:

| Node is blocked by reservation NONE in INFINITY

| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------

| checkjob gives the following

| checking job 427

| State: Idle EState: Deferred
| Creds: user:guestuser2 group:guestuser2 class:p60 qos:DEFAULT
| WallTime: 00:00:00 of 1:00:00
| SubmitTime: Thu Oct 24 11:11:50
| (Time Queued Total: 10:03:14 Eligible: 00:00:00)

| Total Tasks: 4

| Req[0] TaskCount: 4 Partition: ALL
| Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
| Opsys: [NONE] Arch: [NONE] Features: [NONE]

| IWD: [NONE] Executable: [NONE]
| Bypass: 0 StartCount: 0
| PartitionMask: [ALL]
| Flags: RESTARTABLE

| job is deferred. Reason: NoResources (cannot create reservation for
| job '427' (intital reservation attempt)
| )
| Holds: Defer (hold reason: NoResources)
| PE: 4.00 StartPriority: 58
| cannot select job 427 for partition DEFAULT (job hold active)

| -----------------------------------------------------------------------------------------------------------------------------------------------

| My queue configuration is as follows

| queue_type = Execution
| max_queuable = 120
| max_user_queuable = 120
| total_jobs = 3
| state_count = Transit:0 Queued:-2 Held:0 Waiting:0 Running:5
| Exiting:0
| max_running = 20
| resources_max.ncpus = 64
| resources_max.walltime = 48:00:00
| resources_default.neednodes = 1
| resources_default.nodect = 1
| resources_default.nodes = 1
| resources_default.walltime = 01:00:00
| mtime = Thu Oct 24 15:28:41 2013
| resources_assigned.nodect = 0
| max_user_run = 10
| enabled = True
| started = True
| ----------------------------------------------------------------------------------------------------------------------------------

| My Pbs server configuration is as follows

| server_state = Idle
| total_jobs = 3
| state_count = Transit:0 Queued:-129 Held:127 Waiting:0 Running:5
| Exiting:0
| acl_hosts = krplpadul001
| default_queue = p60
| log_events = 511
| mail_from = adm
| resources_assigned.nodect = 0
| scheduler_iteration = 600
| node_check_rate = 150
| tcp_timeout = 6
| pbs_version = 2.5.12
| auto_node_np = False
| next_job_number = 428
| net_counter = 12 4 2
| record_job_info = True
| record_job_script = True
| job_log_keep_days = 7

| -------------------------------------------------------------------------------------------------------------------------------------------------------
| Maui schedctl shows the following

| REJECTNEGPRIOJOBS[0] FALSE
| ENABLENEGJOBPRIORITY[0] FALSE
| ENABLEMULTINODEJOBS[0] TRUE
| ENABLEMULTIREQJOBS[0] FALSE
| BFPRIORITYPOLICY[0] [NONE]
| JOBPRIOACCRUALPOLICY QUEUEPOLICY
| NODELOADPOLICY ADJUSTSTATE
| USEMACHINESPEED FALSE
| USESYSTEMQUEUETIME TRUE
| USELOCALMACHINEPRIORITY FALSE
| NODEUNTRACKEDLOADFACTOR 1.2
| JOBNODEMATCHPOLICY[0]

| JOBMAXSTARTTIME[0] INFINITY

| METAMAXTASKS[0] 0
| NODESETPOLICY[0] [NONE]
| NODESETATTRIBUTE[0] [NONE]
| NODESETLIST[0]
| NODESETDELAY[0] 00:00:00
| NODESETPRIORITYTYPE[0] MINLOSS
| NODESETTOLERANCE[0] 0.00

| BACKFILLPOLICY[0] FIRSTFIT
| BACKFILLDEPTH[0] 0
| BACKFILLPROCFACTOR[0] 0
| BACKFILLMAXSCHEDULES[0] 10000
| BACKFILLMETRIC[0] PROCS

| BFCHUNKDURATION[0] 00:00:00
| BFCHUNKSIZE[0] 0
| PREEMPTPOLICY[0] REQUEUE
| MINADMINSTIME[0] 00:00:00
| RESOURCELIMITPOLICY[0]
| NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT]
| NODEALLOCATIONPOLICY[0] MINRESOURCE
| TASKDISTRIBUTIONPOLICY[0] DEFAULT
| RESERVATIONPOLICY[0] CURRENTHIGHEST
| RESERVATIONRETRYTIME[0] 00:00:00
| RESERVATIONTHRESHOLDTYPE[0] NONE
| RESERVATIONTHRESHOLDVALUE[0] 0

| FSPOLICY [NONE]
| FSPOLICY [NONE]
| FSINTERVAL 12:00:00
| FSDEPTH 8
| FSDECAY 1.00

| # Priority Weights

| SERVICEWEIGHT[0] 1
| TARGETWEIGHT[0] 1
| CREDWEIGHT[0] 1
| ATTRWEIGHT[0] 1
| FSWEIGHT[0] 1
| RESWEIGHT[0] 1
| USAGEWEIGHT[0] 1
| QUEUETIMEWEIGHT[0] 1
| XFACTORWEIGHT[0] 0
| SPVIOLATIONWEIGHT[0] 0
| BYPASSWEIGHT[0] 0
| TARGETQUEUETIMEWEIGHT[0] 0
| TARGETXFACTORWEIGHT[0] 0
| USERWEIGHT[0] 0
| GROUPWEIGHT[0] 0
| ACCOUNTWEIGHT[0] 0
| QOSWEIGHT[0] 0
| CLASSWEIGHT[0] 0
| FSUSERWEIGHT[0] 0
| FSGROUPWEIGHT[0] 0
| FSACCOUNTWEIGHT[0] 0
| FSQOSWEIGHT[0] 0
| FSCLASSWEIGHT[0] 0
| ATTRATTRWEIGHT[0] 0
| ATTRSTATEWEIGHT[0] 0
| NODEWEIGHT[0] 0
| PROCWEIGHT[0] 0
| MEMWEIGHT[0] 0
| SWAPWEIGHT[0] 0
| DISKWEIGHT[0] 0
| PSWEIGHT[0] 0
| PEWEIGHT[0] 0
| WALLTIMEWEIGHT[0] 0
| UPROCWEIGHT[0] 0
| UJOBWEIGHT[0] 0
| CONSUMEDWEIGHT[0] 0
| USAGEEXECUTIONTIMEWEIGHT[0] 0
| REMAININGWEIGHT[0] 0
| PERCENTWEIGHT[0] 0
| XFMINWCLIMIT[0] 00:02:00

| # partition DEFAULT policies

| REJECTNEGPRIOJOBS[1] FALSE
| ENABLENEGJOBPRIORITY[1] FALSE
| ENABLEMULTINODEJOBS[1] TRUE
| ENABLEMULTIREQJOBS[1] FALSE
| BFPRIORITYPOLICY[1] [NONE]
| JOBPRIOACCRUALPOLICY QUEUEPOLICY
| NODELOADPOLICY ADJUSTSTATE
| JOBNODEMATCHPOLICY[1]

| JOBMAXSTARTTIME[1] INFINITY

| METAMAXTASKS[1] 0
| NODESETPOLICY[1] [NONE]
| NODESETATTRIBUTE[1] [NONE]
| NODESETLIST[1]
| NODESETDELAY[1] 00:00:00
| NODESETPRIORITYTYPE[1] MINLOSS
| NODESETTOLERANCE[1] 0.00

| # Priority Weights

| XFMINWCLIMIT[1] 00:00:00

| RMAUTHTYPE[0] CHECKSUM

| CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE]
| CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE]
| CLASSCFG[p60] DEFAULT.FEATURES=[NONE]
| CLASSCFG[long] DEFAULT.FEATURES=[NONE]
| QOSPRIORITY[0] 0
| QOSQTWEIGHT[0] 0
| QOSXFWEIGHT[0] 0
| QOSTARGETXF[0] 0.00
| QOSTARGETQT[0] 00:00:00
| QOSFLAGS[0]
| QOSPRIORITY[1] 0
| QOSQTWEIGHT[1] 0
| QOSXFWEIGHT[1] 0
| QOSTARGETXF[1] 0.00
| QOSTARGETQT[1] 00:00:00
| QOSFLAGS[1]
| # SERVER MODULES: MX
| SERVERMODE NORMAL
| SERVERNAME
| SERVERHOST krplpadul001
| SERVERPORT 42559
| LOGFILE maui.log
| LOGFILEMAXSIZE 10000000
| LOGFILEROLLDEPTH 1
| LOGLEVEL 3
| LOGFACILITY fALL
| SERVERHOMEDIR /usr/local/maui/
| TOOLSDIR /usr/local/maui/tools/
| LOGDIR /usr/local/maui/log/
| STATDIR /usr/local/maui/stats/
| LOCKFILE /usr/local/maui/maui.pid
| SERVERCONFIGFILE /usr/local/maui/maui.cfg
| CHECKPOINTFILE /usr/local/maui/ maui.ck
| CHECKPOINTINTERVAL 00:05:00
| CHECKPOINTEXPIRATIONTIME 3:11:20:00
| TRAPJOB
| TRAPNODE
| TRAPFUNCTION
| RESDEPTH 24

| RMPOLLINTERVAL 00:00:30
| NODEACCESSPOLICY SHARED
| ALLOCLOCALITYPOLICY [NONE]
| SIMTIMEPOLICY [NONE]
| ADMIN1 root
| ADMINHOSTS ALL
| NODEPOLLFREQUENCY 0
| DISPLAYFLAGS
| DEFAULTDOMAIN
| DEFAULTCLASSLIST [DEFAULT:1]
| FEATURENODETYPEHEADER
| FEATUREPROCSPEEDHEADER
| FEATUREPARTITIONHEADER
| DEFERTIME 1:00:00
| DEFERCOUNT 24
| DEFERSTARTCOUNT 1
| JOBPURGETIME 0
| NODEPURGETIME 2140000000
| APIFAILURETHRESHHOLD 6
| NODESYNCTIME 600
| JOBSYNCTIME 600
| JOBMAXOVERRUN 00:10:00
| NODEMAXLOAD 0.0

| PLOTMINTIME 120
| PLOTMAXTIME 245760
| PLOTTIMESCALE 11
| PLOTMINPROC 1
| PLOTMAXPROC 512
| PLOTPROCSCALE 9
| SCHEDCFG[] MODE=NORMAL SERVER=krplpadul001:42559
| # RM MODULES: PBS SSS WIKI NATIVE
| RMCFG[KRPLPADUL001] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09
| TYPE=PBS
| SIMWORKLOADTRACEFILE workload
| SIMRESOURCETRACEFILE resource
| SIMAUTOSHUTDOWN OFF
| SIMSTARTTIME 0
| SIMSCALEJOBRUNTIME FALSE
| SIMFLAGS
| SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH
| SIMINITIALQUEUEDEPTH 16
| SIMWCACCURACY 0.00
| SIMWCACCURACYCHANGE 0.00
| SIMNODECOUNT 0
| SIMNODECONFIGURATION NORMAL
| SIMWCSCALINGPERCENT 100
| SIMCOMRATE 0.10
| SIMCOMTYPE ROUNDROBIN
| COMINTRAFRAMECOST 0.30
| COMINTERFRAMECOST 0.30
| SIMSTOPITERATION -1
| SIMEXITITERATION -1

| What am i missing here that maui can not schedule jobs?

| Sincerely,

| Amjad

| _______________________________________________
| torqueusers mailing list
| torqueusers at supercluster.org
| http://www.supercluster.org/mailman/listinfo/torqueusers

-- 

James A. Peltier 
Manager, IT Services - Research Computing Group 
Simon Fraser University - Burnaby Campus 
Phone : 778-782-6573 
Fax : 778-782-3045 
E-Mail : jpeltier at sfu.ca 
Website : http://www.sfu.ca/itservices 

“A successful person is one who can lay a solid foundation from the bricks others have thrown at them.” -David Brinkley via Luke Shaw 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131024/352cc208/attachment-0001.html 


More information about the torqueusers mailing list