[torqueusers] Multi core jobs are not scheduled by Maui

Amjad Syed amjadcsu at gmail.com
Thu Oct 24 12:10:10 MDT 2013


Hello

We are having a small 60 node cluster with 12 cores each.

We are using Torque 2.5.12 and Maui 3.3

When i submit jobs on torque using multi-core  using #PBS directive , the
jobs get queued in the queue. I can run the job with qsub in Interactive
mode.

#PBS -l "nodes=1:ppn=4"

The error i get in  maui log file as follows :


10/24 20:16:10 ALERT:    job 427 cannot run in any partition
10/24 20:16:10 ALERT:    cannot create new reservation for job 427
(shape[1] 4)
10/24 20:16:10 ALERT:    cannot create new reservation for job 427
10/24 20:16:10 ALERT:    job '427' cannot run (deferring job for 3600
seconds)

-----------------------------------------------------------------------------------------------------
The  showref  shows the following

Reservations

ReservationID       Type S       Start         End    Duration    N/P
StartTime


0 reservations located

---------------------------------------------------------------------------------------------------------------------

Showbf gives the following error:

Node is blocked  by reservation NONE in INFINITY

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

checkjob  gives the following

checking job 427

State: Idle  EState: Deferred
Creds:  user:guestuser2  group:guestuser2  class:p60  qos:DEFAULT
WallTime: 00:00:00 of 1:00:00
SubmitTime: Thu Oct 24 11:11:50
  (Time Queued  Total: 10:03:14  Eligible: 00:00:00)

Total Tasks: 4

Req[0]  TaskCount: 4  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE

job is deferred.  Reason:  NoResources  (cannot create reservation for job
'427' (intital reservation attempt)
)
Holds:    Defer  (hold reason:  NoResources)
PE:  4.00  StartPriority:  58
cannot select job 427 for partition DEFAULT (job hold active)

-----------------------------------------------------------------------------------------------------------------------------------------------

My queue configuration is as follows

 queue_type = Execution
        max_queuable = 120
        max_user_queuable = 120
        total_jobs = 3
        state_count = Transit:0 Queued:-2 Held:0 Waiting:0 Running:5
Exiting:0
        max_running = 20
        resources_max.ncpus = 64
        resources_max.walltime = 48:00:00
        resources_default.neednodes = 1
        resources_default.nodect = 1
        resources_default.nodes = 1
        resources_default.walltime = 01:00:00
        mtime = Thu Oct 24 15:28:41 2013
        resources_assigned.nodect = 0
        max_user_run = 10
        enabled = True
        started = True
----------------------------------------------------------------------------------------------------------------------------------

My Pbs server configuration is as follows

 server_state = Idle
        total_jobs = 3
        state_count = Transit:0 Queued:-129 Held:127 Waiting:0 Running:5
Exiting:0
        acl_hosts = krplpadul001
        default_queue = p60
        log_events = 511
        mail_from = adm
        resources_assigned.nodect = 0
        scheduler_iteration = 600
        node_check_rate = 150
        tcp_timeout = 6
        pbs_version = 2.5.12
        auto_node_np = False
        next_job_number = 428
        net_counter = 12 4 2
        record_job_info = True
        record_job_script = True
        job_log_keep_days = 7


-------------------------------------------------------------------------------------------------------------------------------------------------------
Maui schedctl shows the following


REJECTNEGPRIOJOBS[0]              FALSE
ENABLENEGJOBPRIORITY[0]           FALSE
ENABLEMULTINODEJOBS[0]            TRUE
ENABLEMULTIREQJOBS[0]             FALSE
BFPRIORITYPOLICY[0]               [NONE]
JOBPRIOACCRUALPOLICY            QUEUEPOLICY
NODELOADPOLICY                  ADJUSTSTATE
USEMACHINESPEED                 FALSE
USESYSTEMQUEUETIME              TRUE
USELOCALMACHINEPRIORITY         FALSE
NODEUNTRACKEDLOADFACTOR         1.2
JOBNODEMATCHPOLICY[0]

JOBMAXSTARTTIME[0]                  INFINITY

METAMAXTASKS[0]                   0
NODESETPOLICY[0]                  [NONE]
NODESETATTRIBUTE[0]               [NONE]
NODESETLIST[0]
NODESETDELAY[0]                   00:00:00
NODESETPRIORITYTYPE[0]            MINLOSS
NODESETTOLERANCE[0]                 0.00

BACKFILLPOLICY[0]                 FIRSTFIT
BACKFILLDEPTH[0]                  0
BACKFILLPROCFACTOR[0]             0
BACKFILLMAXSCHEDULES[0]           10000
BACKFILLMETRIC[0]                 PROCS

BFCHUNKDURATION[0]                00:00:00
BFCHUNKSIZE[0]                    0
PREEMPTPOLICY[0]                  REQUEUE
MINADMINSTIME[0]                  00:00:00
RESOURCELIMITPOLICY[0]
NODEAVAILABILITYPOLICY[0]         COMBINED:[DEFAULT]
NODEALLOCATIONPOLICY[0]           MINRESOURCE
TASKDISTRIBUTIONPOLICY[0]         DEFAULT
RESERVATIONPOLICY[0]              CURRENTHIGHEST
RESERVATIONRETRYTIME[0]           00:00:00
RESERVATIONTHRESHOLDTYPE[0]       NONE
RESERVATIONTHRESHOLDVALUE[0]      0

FSPOLICY                        [NONE]
FSPOLICY                        [NONE]
FSINTERVAL                      12:00:00
FSDEPTH                         8
FSDECAY                         1.00



# Priority Weights

SERVICEWEIGHT[0]                  1
TARGETWEIGHT[0]                   1
CREDWEIGHT[0]                     1
ATTRWEIGHT[0]                     1
FSWEIGHT[0]                       1
RESWEIGHT[0]                      1
USAGEWEIGHT[0]                    1
QUEUETIMEWEIGHT[0]                1
XFACTORWEIGHT[0]                  0
SPVIOLATIONWEIGHT[0]              0
BYPASSWEIGHT[0]                   0
TARGETQUEUETIMEWEIGHT[0]          0
TARGETXFACTORWEIGHT[0]            0
USERWEIGHT[0]                     0
GROUPWEIGHT[0]                    0
ACCOUNTWEIGHT[0]                  0
QOSWEIGHT[0]                      0
CLASSWEIGHT[0]                    0
FSUSERWEIGHT[0]                   0
FSGROUPWEIGHT[0]                  0
FSACCOUNTWEIGHT[0]                0
FSQOSWEIGHT[0]                    0
FSCLASSWEIGHT[0]                  0
ATTRATTRWEIGHT[0]                 0
ATTRSTATEWEIGHT[0]                0
NODEWEIGHT[0]                     0
PROCWEIGHT[0]                     0
MEMWEIGHT[0]                      0
SWAPWEIGHT[0]                     0
DISKWEIGHT[0]                     0
PSWEIGHT[0]                       0
PEWEIGHT[0]                       0
WALLTIMEWEIGHT[0]                 0
UPROCWEIGHT[0]                    0
UJOBWEIGHT[0]                     0
CONSUMEDWEIGHT[0]                 0
USAGEEXECUTIONTIMEWEIGHT[0]       0
REMAININGWEIGHT[0]                0
PERCENTWEIGHT[0]                  0
XFMINWCLIMIT[0]                   00:02:00


# partition DEFAULT policies

REJECTNEGPRIOJOBS[1]              FALSE
ENABLENEGJOBPRIORITY[1]           FALSE
ENABLEMULTINODEJOBS[1]            TRUE
ENABLEMULTIREQJOBS[1]             FALSE
BFPRIORITYPOLICY[1]               [NONE]
JOBPRIOACCRUALPOLICY            QUEUEPOLICY
NODELOADPOLICY                  ADJUSTSTATE
JOBNODEMATCHPOLICY[1]

JOBMAXSTARTTIME[1]                  INFINITY

METAMAXTASKS[1]                   0
NODESETPOLICY[1]                  [NONE]
NODESETATTRIBUTE[1]               [NONE]
NODESETLIST[1]
NODESETDELAY[1]                   00:00:00
NODESETPRIORITYTYPE[1]            MINLOSS
NODESETTOLERANCE[1]                 0.00

# Priority Weights

XFMINWCLIMIT[1]                   00:00:00

RMAUTHTYPE[0]                     CHECKSUM

CLASSCFG[[NONE]]  DEFAULT.FEATURES=[NONE]
CLASSCFG[[ALL]]  DEFAULT.FEATURES=[NONE]
CLASSCFG[p60]  DEFAULT.FEATURES=[NONE]
CLASSCFG[long]  DEFAULT.FEATURES=[NONE]
QOSPRIORITY[0]                    0
QOSQTWEIGHT[0]                    0
QOSXFWEIGHT[0]                    0
QOSTARGETXF[0]                      0.00
QOSTARGETQT[0]                    00:00:00
QOSFLAGS[0]
QOSPRIORITY[1]                    0
QOSQTWEIGHT[1]                    0
QOSXFWEIGHT[1]                    0
QOSTARGETXF[1]                      0.00
QOSTARGETQT[1]                    00:00:00
QOSFLAGS[1]
# SERVER MODULES:  MX
SERVERMODE                      NORMAL
SERVERNAME
SERVERHOST                      krplpadul001
SERVERPORT                      42559
LOGFILE                         maui.log
LOGFILEMAXSIZE                  10000000
LOGFILEROLLDEPTH                1
LOGLEVEL                        3
LOGFACILITY                     fALL
SERVERHOMEDIR                   /usr/local/maui/
TOOLSDIR                        /usr/local/maui/tools/
LOGDIR                          /usr/local/maui/log/
STATDIR                         /usr/local/maui/stats/
LOCKFILE                        /usr/local/maui/maui.pid
SERVERCONFIGFILE                /usr/local/maui/maui.cfg
CHECKPOINTFILE                  /usr/local/maui/maui.ck
CHECKPOINTINTERVAL              00:05:00
CHECKPOINTEXPIRATIONTIME        3:11:20:00
TRAPJOB
TRAPNODE
TRAPFUNCTION
RESDEPTH                        24

RMPOLLINTERVAL                  00:00:30
NODEACCESSPOLICY                SHARED
ALLOCLOCALITYPOLICY             [NONE]
SIMTIMEPOLICY                   [NONE]
ADMIN1                          root
ADMINHOSTS                      ALL
NODEPOLLFREQUENCY               0
DISPLAYFLAGS
DEFAULTDOMAIN
DEFAULTCLASSLIST                [DEFAULT:1]
FEATURENODETYPEHEADER
FEATUREPROCSPEEDHEADER
FEATUREPARTITIONHEADER
DEFERTIME                       1:00:00
DEFERCOUNT                      24
DEFERSTARTCOUNT                 1
JOBPURGETIME                    0
NODEPURGETIME                   2140000000
APIFAILURETHRESHHOLD            6
NODESYNCTIME                    600
JOBSYNCTIME                     600
JOBMAXOVERRUN                   00:10:00
NODEMAXLOAD                     0.0

PLOTMINTIME                     120
PLOTMAXTIME                     245760
PLOTTIMESCALE                   11
PLOTMINPROC                     1
PLOTMAXPROC                     512
PLOTPROCSCALE                   9
SCHEDCFG[]                        MODE=NORMAL SERVER=krplpadul001:42559
# RM MODULES: PBS SSS WIKI NATIVE
RMCFG[KRPLPADUL001] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09 TYPE=PBS
SIMWORKLOADTRACEFILE            workload
SIMRESOURCETRACEFILE            resource
SIMAUTOSHUTDOWN                 OFF
SIMSTARTTIME                    0
SIMSCALEJOBRUNTIME              FALSE
SIMFLAGS
SIMJOBSUBMISSIONPOLICY          CONSTANTJOBDEPTH
SIMINITIALQUEUEDEPTH            16
SIMWCACCURACY                   0.00
SIMWCACCURACYCHANGE             0.00
SIMNODECOUNT                    0
SIMNODECONFIGURATION            NORMAL
SIMWCSCALINGPERCENT             100
SIMCOMRATE                      0.10
SIMCOMTYPE                      ROUNDROBIN
COMINTRAFRAMECOST               0.30
COMINTERFRAMECOST               0.30
SIMSTOPITERATION                -1
SIMEXITITERATION                -1



What am i missing here that maui can not schedule jobs?

Sincerely,
Amjad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131024/d4510b86/attachment.html 


More information about the torqueusers mailing list