[torqueusers] Multi core jobs are not scheduled by Maui

Gus Correa gus at ldeo.columbia.edu
Thu Oct 24 12:34:36 MDT 2013


Did you enable scheduling?
On the Torque server:

qmgr -c 'set server scheduling = True'

I hope it helps,
Gus Correa

On 10/24/2013 02:10 PM, Amjad Syed wrote:
> Hello
>
> We are having a small 60 node cluster with 12 cores each.
>
> We are using Torque 2.5.12 and Maui 3.3
>
> When i submit jobs on torque using multi-core  using #PBS directive ,
> the jobs get queued in the queue. I can run the job with qsub in
> Interactive mode.
>
> #PBS -l "nodes=1:ppn=4"
>
> The error i get in  maui log file as follows :
>
>
> 10/24 20:16:10 ALERT:    job 427 cannot run in any partition
> 10/24 20:16:10 ALERT:    cannot create new reservation for job 427
> (shape[1] 4)
> 10/24 20:16:10 ALERT:    cannot create new reservation for job 427
> 10/24 20:16:10 ALERT:    job '427' cannot run (deferring job for 3600
> seconds)
>
> -----------------------------------------------------------------------------------------------------
> The  showref  shows the following
>
> Reservations
>
> ReservationID       Type S       Start         End    Duration    N/P
> StartTime
>
>
> 0 reservations located
>
> ---------------------------------------------------------------------------------------------------------------------
>
> Showbf gives the following error:
>
> Node is blocked  by reservation NONE in INFINITY
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> checkjob  gives the following
>
> checking job 427
>
> State: Idle  EState: Deferred
> Creds:  user:guestuser2  group:guestuser2  class:p60  qos:DEFAULT
> WallTime: 00:00:00 of 1:00:00
> SubmitTime: Thu Oct 24 11:11:50
>    (Time Queued  Total: 10:03:14  Eligible: 00:00:00)
>
> Total Tasks: 4
>
> Req[0]  TaskCount: 4  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
>
>
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 0  StartCount: 0
> PartitionMask: [ALL]
> Flags:       RESTARTABLE
>
> job is deferred.  Reason:  NoResources  (cannot create reservation for
> job '427' (intital reservation attempt)
> )
> Holds:    Defer  (hold reason:  NoResources)
> PE:  4.00  StartPriority:  58
> cannot select job 427 for partition DEFAULT (job hold active)
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> My queue configuration is as follows
>
>   queue_type = Execution
>          max_queuable = 120
>          max_user_queuable = 120
>          total_jobs = 3
>          state_count = Transit:0 Queued:-2 Held:0 Waiting:0 Running:5
> Exiting:0
>          max_running = 20
>          resources_max.ncpus = 64
>          resources_max.walltime = 48:00:00
>          resources_default.neednodes = 1
>          resources_default.nodect = 1
>          resources_default.nodes = 1
>          resources_default.walltime = 01:00:00
>          mtime = Thu Oct 24 15:28:41 2013
>          resources_assigned.nodect = 0
>          max_user_run = 10
>          enabled = True
>          started = True
> ----------------------------------------------------------------------------------------------------------------------------------
>
> My Pbs server configuration is as follows
>
>   server_state = Idle
>          total_jobs = 3
>          state_count = Transit:0 Queued:-129 Held:127 Waiting:0
> Running:5 Exiting:0
>          acl_hosts = krplpadul001
>          default_queue = p60
>          log_events = 511
>          mail_from = adm
>          resources_assigned.nodect = 0
>          scheduler_iteration = 600
>          node_check_rate = 150
>          tcp_timeout = 6
>          pbs_version = 2.5.12
>          auto_node_np = False
>          next_job_number = 428
>          net_counter = 12 4 2
>          record_job_info = True
>          record_job_script = True
>          job_log_keep_days = 7
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Maui schedctl shows the following
>
>
> REJECTNEGPRIOJOBS[0]              FALSE
> ENABLENEGJOBPRIORITY[0]           FALSE
> ENABLEMULTINODEJOBS[0]            TRUE
> ENABLEMULTIREQJOBS[0]             FALSE
> BFPRIORITYPOLICY[0]               [NONE]
> JOBPRIOACCRUALPOLICY            QUEUEPOLICY
> NODELOADPOLICY                  ADJUSTSTATE
> USEMACHINESPEED                 FALSE
> USESYSTEMQUEUETIME              TRUE
> USELOCALMACHINEPRIORITY         FALSE
> NODEUNTRACKEDLOADFACTOR         1.2
> JOBNODEMATCHPOLICY[0]
>
> JOBMAXSTARTTIME[0]                  INFINITY
>
> METAMAXTASKS[0]                   0
> NODESETPOLICY[0]                  [NONE]
> NODESETATTRIBUTE[0]               [NONE]
> NODESETLIST[0]
> NODESETDELAY[0]                   00:00:00
> NODESETPRIORITYTYPE[0]            MINLOSS
> NODESETTOLERANCE[0]                 0.00
>
> BACKFILLPOLICY[0]                 FIRSTFIT
> BACKFILLDEPTH[0]                  0
> BACKFILLPROCFACTOR[0]             0
> BACKFILLMAXSCHEDULES[0]           10000
> BACKFILLMETRIC[0]                 PROCS
>
> BFCHUNKDURATION[0]                00:00:00
> BFCHUNKSIZE[0]                    0
> PREEMPTPOLICY[0]                  REQUEUE
> MINADMINSTIME[0]                  00:00:00
> RESOURCELIMITPOLICY[0]
> NODEAVAILABILITYPOLICY[0]         COMBINED:[DEFAULT]
> NODEALLOCATION POLICY[0]           MINRESOURCE
> TASKDISTRIBUTIONPOLICY[0]         DEFAULT
> RESERVATIONPOLICY[0]              CURRENTHIGHEST
> RESERVATIONRETRYTIME[0]           00:00:00
> RESERVATIONTHRESHOLDTYPE[0]       NONE
> RESERVATIONTHRESHOLDVALUE[0]      0
>
> FSPOLICY                        [NONE]
> FSPOLICY                        [NONE]
> FSINTERVAL                      12:00:00
> FSDEPTH                         8
> FSDECAY                         1.00
>
>
>
> # Priority Weights
>
> SERVICEWEIGHT[0]                  1
> TARGETWEIGHT[0]                   1
> CREDWEIGHT[0]                     1
> ATTRWEIGHT[0]                     1
> FSWEIGHT[0]                       1
> RESWEIGHT[0]                      1
> USAGEWEIGHT[0]                    1
> QUEUETIMEWEIGHT[0]                1
> XFACTORWEIGHT[0]                  0
> SPVIOLATIONWEIGHT[0]              0
> BYPASSWEIGHT[0]                   0
> TARGETQUEUETIMEWEIGHT[0]          0
> TARGETXFACTORWEIGHT[0]            0
> USERWEIGHT[0]                     0
> GROUPWEIGHT[0]                    0
> ACCOUNTWEIGHT[0]                  0
> QOSWEIGHT[0]                      0
> CLASSWEIGHT[0]                    0
> FSUSERWEIGHT[0]                   0
> FSGROUPWEIGHT[0]                  0
> FSACCOUNTWEIGHT[0]                0
> FSQOSWEIGHT[0]                    0
> FSCLASSWEIGHT[0]                  0
> ATTRATTRWEIGHT[0]                 0
> ATTRSTATEWEIGHT[0]                0
> NODEWEIGHT[0]                     0
> PROCWEIGHT[0]                     0
> MEMWEIGHT[0]                      0
> SWAPWEIGHT[0]                     0
> DISKWEIGHT[0]                     0
> PSWEIGHT[0]                       0
> PEWEIGHT[0]                       0
> WALLTIMEWEIGHT[0]                 0
> UPROCWEIGHT[0]                    0
> UJOBWEIGHT[0]                     0
> CONSUMEDWEIGHT[0]                 0
> USAGEEXECUTIONTIMEWEIGHT[0]       0
> REMAININGWEIGHT[0]                0
> PERCENTWEIGHT[0]                  0
> XFMINWCLIMIT[0]                   00:02:00
>
>
> # partition DEFAULT policies
>
> REJECTNEGPRIOJOBS[1]              FALSE
> ENABLENEGJOBPRIORITY[1]           FALSE
> ENABLEMULTINODEJOBS[1]            TRUE
> ENABLEMULTIREQJOBS[1]             FALSE
> BFPRIORITYPOLICY[1]               [NONE]
> JOBPRIOACCRUALPOLICY            QUEUEPOLICY
> NODELOADPOLICY                  ADJUSTSTATE
> JOBNODEMATCHPOLICY[1]
>
> JOBMAXSTARTTIME[1]                  INFINITY
>
> METAMAXTASKS[1]                   0
> NODESETPOLICY[1]                  [NONE]
> NODESETATTRIBUTE[1]               [NONE]
> NODESETLIST[1]
> NODESETDELAY[1]                   00:00:00
> NODESETPRIORITYTYPE[1]            MINLOSS
> NODESETTOLERANCE[1]                 0.00
>
> # Priority Weights
>
> XFMINWCLIMIT[1]                   00:00:00
>
> RMAUTHTYPE[0]                     CHECKSUM
>
> CLASSCFG[[NONE]]  DEFAULT.FEATURES=[NONE]
> CLASSCFG[[ALL]]  DEFAULT.FEATURES=[NONE]
> CLASSCFG[p60]  DEFAULT.FEATURES=[NONE]
> CLASSCFG[long]  DEFAULT.FEATURES=[NONE]
> QOSPRIORITY[0]                    0
> QOSQTWEIGHT[0]                    0
> QOSXFWEIGHT[0]                    0
> QOSTARGETXF[0]                      0.00
> QOSTARGETQT[0]                    00:00:00
> QOSFLAGS[0]
> QOSPRIORITY[1]                    0
> QOSQTWEIGHT[1]                    0
> QOSXFWEIGHT[1]                    0
> QOSTARGETXF[1]                      0.00
> QOSTARGETQT[1]                    00:00:00
> QOSFLAGS[1]
> # SERVER MODULES:  MX
> SERVERMODE                      NORMAL
> SERVERNAME
> SERVERHOST                      krplpadul001
> SERVERPORT                      42559
> LOGFILE                         maui.log
> LOGFILEMAXSIZE                  10000000
> LOGFILEROLLDEPTH                1
> LOGLEVEL                        3
> LOGFACILITY                     fALL
> SERVERHOMEDIR                   /usr/local/maui/
> TOOLSDIR                        /usr/local/maui/tools/
> LOGDIR                          /usr/local/maui/log/
> STATDIR                         /usr/local/maui/stats/
> LOCKFILE                        /usr/local/maui/maui.pid
> SERVERCONFIGFILE                /usr/local/maui/maui.cfg
> CHECKPOINTFILE                  /usr/local/maui/maui.ck <http://maui.ck>
> CHECKPOINTINTERVAL              00:05:00
> CHECKPOINTEXPIRATIONTIME        3:11:20:00
> TRAPJOB
> TRAPNODE
> TRAPFUNCTION
> RESDEPTH                        24
>
> RMPOLLINTERVAL                  00:00:30
> NODEACCESSPOLICY                SHARED
> ALLOCLOCALITYPOLICY             [NONE]
> SIMTIMEPOLICY                   [NONE]
> ADMIN1                          root
> ADMINHOSTS                      ALL
> NODEPOLLFREQUENCY               0
> DISPLAYFLAGS
> DEFAULTDOMAIN
> DEFAULTCLASSLIST                [DEFAULT:1]
> FEATURENODETYPEHEADER
> FEATUREPROCSPEEDHEADER
> FEATUREPARTITIONHEADER
> DEFERTIME                       1:00:00
> DEFERCOUNT                      24
> DEFERSTARTCOUNT                 1
> JOBPURGETIME                    0
> NODEPURGETIME                   2140000000
> APIFAILURETHRESHHOLD            6
> NODESYNCTIME                    600
> JOBSYNCTIME                     600
> JOBMAXOVERRUN                   00:10:00
> NODEMAXLOAD                     0.0
>
> PLOTMINTIME                     120
> PLOTMAXTIME                     245760
> PLOTTIMESCALE                   11
> PLOTMINPROC                     1
> PLOTMAXPROC                     512
> PLOTPROCSCALE                   9
> SCHEDCFG[]                        MODE=NORMAL SERVER=krplpadul001:42559
> # RM MODULES: PBS SSS WIKI NATIVE
> RMCFG[KRPLPADUL001] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09 TYPE=PBS
> SIMWORKLOADTRACEFILE            workload
> SIMRESOURCETRACEFILE            resource
> SIMAUTOSHUTDOWN                 OFF
> SIMSTARTTIME                    0
> SIMSCALEJOBRUNTIME              FALSE
> SIMFLAGS
> SIMJOBSUBMISSIONPOLICY          CONSTANTJOBDEPTH
> SIMINITIALQUEUEDEPTH            16
> SIMWCACCURACY                   0.00
> SIMWCACCURACYCHANGE             0.00
> SIMNODECOUNT                    0
> SIMNODECONFIGURATION            NORMAL
> SIMWCSCALINGPERCENT             100
> SIMCOMRATE                      0.10
> SIMCOMTYPE                      ROUNDROBIN
> COMINTRAFRAMECOST               0.30
> COMINTERFRAMECOST               0.30
> SIMSTOPITERATION                -1
> SIMEXITITERATION                -1
>
>
>
> What am i missing here that maui can not schedule jobs?
>
> Sincerely,
> Amjad
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list