[torqueusers] Multi core jobs are not scheduled by Maui

Amjad Syed amjadcsu at gmail.com
Thu Oct 24 12:59:10 MDT 2013


Thanks Gus  and James. It solved my problem.

Amjad


On Thu, Oct 24, 2013 at 9:34 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:

> Did you enable scheduling?
> On the Torque server:
>
> qmgr -c 'set server scheduling = True'
>
> I hope it helps,
> Gus Correa
>
> On 10/24/2013 02:10 PM, Amjad Syed wrote:
> > Hello
> >
> > We are having a small 60 node cluster with 12 cores each.
> >
> > We are using Torque 2.5.12 and Maui 3.3
> >
> > When i submit jobs on torque using multi-core  using #PBS directive ,
> > the jobs get queued in the queue. I can run the job with qsub in
> > Interactive mode.
> >
> > #PBS -l "nodes=1:ppn=4"
> >
> > The error i get in  maui log file as follows :
> >
> >
> > 10/24 20:16:10 ALERT:    job 427 cannot run in any partition
> > 10/24 20:16:10 ALERT:    cannot create new reservation for job 427
> > (shape[1] 4)
> > 10/24 20:16:10 ALERT:    cannot create new reservation for job 427
> > 10/24 20:16:10 ALERT:    job '427' cannot run (deferring job for 3600
> > seconds)
> >
> >
> -----------------------------------------------------------------------------------------------------
> > The  showref  shows the following
> >
> > Reservations
> >
> > ReservationID       Type S       Start         End    Duration    N/P
> > StartTime
> >
> >
> > 0 reservations located
> >
> >
> ---------------------------------------------------------------------------------------------------------------------
> >
> > Showbf gives the following error:
> >
> > Node is blocked  by reservation NONE in INFINITY
> >
> >
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > checkjob  gives the following
> >
> > checking job 427
> >
> > State: Idle  EState: Deferred
> > Creds:  user:guestuser2  group:guestuser2  class:p60  qos:DEFAULT
> > WallTime: 00:00:00 of 1:00:00
> > SubmitTime: Thu Oct 24 11:11:50
> >    (Time Queued  Total: 10:03:14  Eligible: 00:00:00)
> >
> > Total Tasks: 4
> >
> > Req[0]  TaskCount: 4  Partition: ALL
> > Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> > Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> >
> >
> > IWD: [NONE]  Executable:  [NONE]
> > Bypass: 0  StartCount: 0
> > PartitionMask: [ALL]
> > Flags:       RESTARTABLE
> >
> > job is deferred.  Reason:  NoResources  (cannot create reservation for
> > job '427' (intital reservation attempt)
> > )
> > Holds:    Defer  (hold reason:  NoResources)
> > PE:  4.00  StartPriority:  58
> > cannot select job 427 for partition DEFAULT (job hold active)
> >
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------
> >
> > My queue configuration is as follows
> >
> >   queue_type = Execution
> >          max_queuable = 120
> >          max_user_queuable = 120
> >          total_jobs = 3
> >          state_count = Transit:0 Queued:-2 Held:0 Waiting:0 Running:5
> > Exiting:0
> >          max_running = 20
> >          resources_max.ncpus = 64
> >          resources_max.walltime = 48:00:00
> >          resources_default.neednodes = 1
> >          resources_default.nodect = 1
> >          resources_default.nodes = 1
> >          resources_default.walltime = 01:00:00
> >          mtime = Thu Oct 24 15:28:41 2013
> >          resources_assigned.nodect = 0
> >          max_user_run = 10
> >          enabled = True
> >          started = True
> >
> ----------------------------------------------------------------------------------------------------------------------------------
> >
> > My Pbs server configuration is as follows
> >
> >   server_state = Idle
> >          total_jobs = 3
> >          state_count = Transit:0 Queued:-129 Held:127 Waiting:0
> > Running:5 Exiting:0
> >          acl_hosts = krplpadul001
> >          default_queue = p60
> >          log_events = 511
> >          mail_from = adm
> >          resources_assigned.nodect = 0
> >          scheduler_iteration = 600
> >          node_check_rate = 150
> >          tcp_timeout = 6
> >          pbs_version = 2.5.12
> >          auto_node_np = False
> >          next_job_number = 428
> >          net_counter = 12 4 2
> >          record_job_info = True
> >          record_job_script = True
> >          job_log_keep_days = 7
> >
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> > Maui schedctl shows the following
> >
> >
> > REJECTNEGPRIOJOBS[0]              FALSE
> > ENABLENEGJOBPRIORITY[0]           FALSE
> > ENABLEMULTINODEJOBS[0]            TRUE
> > ENABLEMULTIREQJOBS[0]             FALSE
> > BFPRIORITYPOLICY[0]               [NONE]
> > JOBPRIOACCRUALPOLICY            QUEUEPOLICY
> > NODELOADPOLICY                  ADJUSTSTATE
> > USEMACHINESPEED                 FALSE
> > USESYSTEMQUEUETIME              TRUE
> > USELOCALMACHINEPRIORITY         FALSE
> > NODEUNTRACKEDLOADFACTOR         1.2
> > JOBNODEMATCHPOLICY[0]
> >
> > JOBMAXSTARTTIME[0]                  INFINITY
> >
> > METAMAXTASKS[0]                   0
> > NODESETPOLICY[0]                  [NONE]
> > NODESETATTRIBUTE[0]               [NONE]
> > NODESETLIST[0]
> > NODESETDELAY[0]                   00:00:00
> > NODESETPRIORITYTYPE[0]            MINLOSS
> > NODESETTOLERANCE[0]                 0.00
> >
> > BACKFILLPOLICY[0]                 FIRSTFIT
> > BACKFILLDEPTH[0]                  0
> > BACKFILLPROCFACTOR[0]             0
> > BACKFILLMAXSCHEDULES[0]           10000
> > BACKFILLMETRIC[0]                 PROCS
> >
> > BFCHUNKDURATION[0]                00:00:00
> > BFCHUNKSIZE[0]                    0
> > PREEMPTPOLICY[0]                  REQUEUE
> > MINADMINSTIME[0]                  00:00:00
> > RESOURCELIMITPOLICY[0]
> > NODEAVAILABILITYPOLICY[0]         COMBINED:[DEFAULT]
> > NODEALLOCATION POLICY[0]           MINRESOURCE
> > TASKDISTRIBUTIONPOLICY[0]         DEFAULT
> > RESERVATIONPOLICY[0]              CURRENTHIGHEST
> > RESERVATIONRETRYTIME[0]           00:00:00
> > RESERVATIONTHRESHOLDTYPE[0]       NONE
> > RESERVATIONTHRESHOLDVALUE[0]      0
> >
> > FSPOLICY                        [NONE]
> > FSPOLICY                        [NONE]
> > FSINTERVAL                      12:00:00
> > FSDEPTH                         8
> > FSDECAY                         1.00
> >
> >
> >
> > # Priority Weights
> >
> > SERVICEWEIGHT[0]                  1
> > TARGETWEIGHT[0]                   1
> > CREDWEIGHT[0]                     1
> > ATTRWEIGHT[0]                     1
> > FSWEIGHT[0]                       1
> > RESWEIGHT[0]                      1
> > USAGEWEIGHT[0]                    1
> > QUEUETIMEWEIGHT[0]                1
> > XFACTORWEIGHT[0]                  0
> > SPVIOLATIONWEIGHT[0]              0
> > BYPASSWEIGHT[0]                   0
> > TARGETQUEUETIMEWEIGHT[0]          0
> > TARGETXFACTORWEIGHT[0]            0
> > USERWEIGHT[0]                     0
> > GROUPWEIGHT[0]                    0
> > ACCOUNTWEIGHT[0]                  0
> > QOSWEIGHT[0]                      0
> > CLASSWEIGHT[0]                    0
> > FSUSERWEIGHT[0]                   0
> > FSGROUPWEIGHT[0]                  0
> > FSACCOUNTWEIGHT[0]                0
> > FSQOSWEIGHT[0]                    0
> > FSCLASSWEIGHT[0]                  0
> > ATTRATTRWEIGHT[0]                 0
> > ATTRSTATEWEIGHT[0]                0
> > NODEWEIGHT[0]                     0
> > PROCWEIGHT[0]                     0
> > MEMWEIGHT[0]                      0
> > SWAPWEIGHT[0]                     0
> > DISKWEIGHT[0]                     0
> > PSWEIGHT[0]                       0
> > PEWEIGHT[0]                       0
> > WALLTIMEWEIGHT[0]                 0
> > UPROCWEIGHT[0]                    0
> > UJOBWEIGHT[0]                     0
> > CONSUMEDWEIGHT[0]                 0
> > USAGEEXECUTIONTIMEWEIGHT[0]       0
> > REMAININGWEIGHT[0]                0
> > PERCENTWEIGHT[0]                  0
> > XFMINWCLIMIT[0]                   00:02:00
> >
> >
> > # partition DEFAULT policies
> >
> > REJECTNEGPRIOJOBS[1]              FALSE
> > ENABLENEGJOBPRIORITY[1]           FALSE
> > ENABLEMULTINODEJOBS[1]            TRUE
> > ENABLEMULTIREQJOBS[1]             FALSE
> > BFPRIORITYPOLICY[1]               [NONE]
> > JOBPRIOACCRUALPOLICY            QUEUEPOLICY
> > NODELOADPOLICY                  ADJUSTSTATE
> > JOBNODEMATCHPOLICY[1]
> >
> > JOBMAXSTARTTIME[1]                  INFINITY
> >
> > METAMAXTASKS[1]                   0
> > NODESETPOLICY[1]                  [NONE]
> > NODESETATTRIBUTE[1]               [NONE]
> > NODESETLIST[1]
> > NODESETDELAY[1]                   00:00:00
> > NODESETPRIORITYTYPE[1]            MINLOSS
> > NODESETTOLERANCE[1]                 0.00
> >
> > # Priority Weights
> >
> > XFMINWCLIMIT[1]                   00:00:00
> >
> > RMAUTHTYPE[0]                     CHECKSUM
> >
> > CLASSCFG[[NONE]]  DEFAULT.FEATURES=[NONE]
> > CLASSCFG[[ALL]]  DEFAULT.FEATURES=[NONE]
> > CLASSCFG[p60]  DEFAULT.FEATURES=[NONE]
> > CLASSCFG[long]  DEFAULT.FEATURES=[NONE]
> > QOSPRIORITY[0]                    0
> > QOSQTWEIGHT[0]                    0
> > QOSXFWEIGHT[0]                    0
> > QOSTARGETXF[0]                      0.00
> > QOSTARGETQT[0]                    00:00:00
> > QOSFLAGS[0]
> > QOSPRIORITY[1]                    0
> > QOSQTWEIGHT[1]                    0
> > QOSXFWEIGHT[1]                    0
> > QOSTARGETXF[1]                      0.00
> > QOSTARGETQT[1]                    00:00:00
> > QOSFLAGS[1]
> > # SERVER MODULES:  MX
> > SERVERMODE                      NORMAL
> > SERVERNAME
> > SERVERHOST                      krplpadul001
> > SERVERPORT                      42559
> > LOGFILE                         maui.log
> > LOGFILEMAXSIZE                  10000000
> > LOGFILEROLLDEPTH                1
> > LOGLEVEL                        3
> > LOGFACILITY                     fALL
> > SERVERHOMEDIR                   /usr/local/maui/
> > TOOLSDIR                        /usr/local/maui/tools/
> > LOGDIR                          /usr/local/maui/log/
> > STATDIR                         /usr/local/maui/stats/
> > LOCKFILE                        /usr/local/maui/maui.pid
> > SERVERCONFIGFILE                /usr/local/maui/maui.cfg
> > CHECKPOINTFILE                  /usr/local/maui/maui.ck <http://maui.ck>
> > CHECKPOINTINTERVAL              00:05:00
> > CHECKPOINTEXPIRATIONTIME        3:11:20:00
> > TRAPJOB
> > TRAPNODE
> > TRAPFUNCTION
> > RESDEPTH                        24
> >
> > RMPOLLINTERVAL                  00:00:30
> > NODEACCESSPOLICY                SHARED
> > ALLOCLOCALITYPOLICY             [NONE]
> > SIMTIMEPOLICY                   [NONE]
> > ADMIN1                          root
> > ADMINHOSTS                      ALL
> > NODEPOLLFREQUENCY               0
> > DISPLAYFLAGS
> > DEFAULTDOMAIN
> > DEFAULTCLASSLIST                [DEFAULT:1]
> > FEATURENODETYPEHEADER
> > FEATUREPROCSPEEDHEADER
> > FEATUREPARTITIONHEADER
> > DEFERTIME                       1:00:00
> > DEFERCOUNT                      24
> > DEFERSTARTCOUNT                 1
> > JOBPURGETIME                    0
> > NODEPURGETIME                   2140000000
> > APIFAILURETHRESHHOLD            6
> > NODESYNCTIME                    600
> > JOBSYNCTIME                     600
> > JOBMAXOVERRUN                   00:10:00
> > NODEMAXLOAD                     0.0
> >
> > PLOTMINTIME                     120
> > PLOTMAXTIME                     245760
> > PLOTTIMESCALE                   11
> > PLOTMINPROC                     1
> > PLOTMAXPROC                     512
> > PLOTPROCSCALE                   9
> > SCHEDCFG[]                        MODE=NORMAL SERVER=krplpadul001:42559
> > # RM MODULES: PBS SSS WIKI NATIVE
> > RMCFG[KRPLPADUL001] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09
> TYPE=PBS
> > SIMWORKLOADTRACEFILE            workload
> > SIMRESOURCETRACEFILE            resource
> > SIMAUTOSHUTDOWN                 OFF
> > SIMSTARTTIME                    0
> > SIMSCALEJOBRUNTIME              FALSE
> > SIMFLAGS
> > SIMJOBSUBMISSIONPOLICY          CONSTANTJOBDEPTH
> > SIMINITIALQUEUEDEPTH            16
> > SIMWCACCURACY                   0.00
> > SIMWCACCURACYCHANGE             0.00
> > SIMNODECOUNT                    0
> > SIMNODECONFIGURATION            NORMAL
> > SIMWCSCALINGPERCENT             100
> > SIMCOMRATE                      0.10
> > SIMCOMTYPE                      ROUNDROBIN
> > COMINTRAFRAMECOST               0.30
> > COMINTERFRAMECOST               0.30
> > SIMSTOPITERATION                -1
> > SIMEXITITERATION                -1
> >
> >
> >
> > What am i missing here that maui can not schedule jobs?
> >
> > Sincerely,
> > Amjad
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131024/cf3f260a/attachment-0001.html 


More information about the torqueusers mailing list