[Mauiusers] Job is always queued in PBS+maui setup

Mohammad Shafiullah shafirocks at gmail.com
Tue Sep 27 12:49:45 MDT 2005


Hello,

I am trying out the new PBSpro_7.0.0-solaris_sparc and maui3.2.6p13 on a

SunOS 5.10 Generic_118822-11 sun4u sparc SUNW, Sun-Fire-V240 box.

Installation of PBS went smoothly and here's my PBS configuration:

 qmgr

Max open servers: 4

Qmgr: print server

#

# Create queues and set their attributes.

#

#

# Create and define queue masternode

#

create queue masternode

set queue masternode queue_type = Execution

set queue masternode enabled = True

set queue masternode started = True

#

# Create and define queue workq

#

create queue workq

set queue workq queue_type = Execution

set queue workq enabled = True

set queue workq started = True

#

# Create and define queue default

#

create queue default

set queue default queue_type = Execution

set queue default enabled = True

set queue default started = True

#

# Create and define queue GIGOgene

#

create queue GIGOgene

set queue GIGOgene queue_type = Execution

set queue GIGOgene enabled = True

set queue GIGOgene started = True

#

# Set server attributes.

#

set server scheduling = True

set server acl_host_enable = True

set server acl_hosts = *.gds.unomaha.edu

set server acl_user_enable = True

set server acl_users = nazo

set server acl_users += root at bfmaster.gds.unomaha.edu

set server acl_roots = root at bfmaster.gds.unomaha.edu

set server managers = root at bfmaster.gds.unomaha.edu

set server default_queue = default

set server log_events = 511

set server mail_from = adm

set server query_other_jobs = True

set server resources_default.ncpus = 1

set server scheduler_iteration = 600

set server resv_enable = True

set server node_fail_requeue = 310

set server max_array_size = 10000

 I have 20 nodes I am allocating for job submissions and they belong to a
certain

Queue defined above.

 When compiling maui I changed the following in the Makefile:

export OSCCFLAGS=-m64 -g

export OSLDFLAGS=-m64 -lsocket -lnsl –lresolv

 The problem I encounter is when I specify a pbs defined queue to run a job
in.

Maui for some reason is assigning all the nodes it's own built-in DEFAULT
queue which

Is not recognized by PBS and the job never runs and stays in queue.

 Here's portion of the maui.log that I think is relevant:

(e.g. It's machine 3 we are talking about)

09/27 12:05:06 __MPBSGetNodeState(Name,State,PNode)

09/27 12:05:06 INFO: PBS node blackforest3 set to state Idle (free)

09/27 12:05:06 MNodeFind(blackforest3,N)

09/27 12:05:06 MNodeAdd(blackforest3,N)

09/27 12:05:06 MNodeFind(blackforest3,N)

09/27 12:05:06 MRMNodePreLoad(blackforest3,Idle,BFMASTER)

09/27 12:05:06 MPBSNodeLoad(blackforest3,blackforest3,Idle,BFMASTER)

09/27 12:05:06 INFO: PBS node attribute 'Host' value: 'blackforest3' (r:
NULL)

09/27 12:05:06 INFO: PBS node attribute 'Port' value: '15002' (r: NULL)

09/27 12:05:06 INFO: PBS node attribute 'ntype' value: 'cluster' (r: NULL)

09/27 12:05:06 INFO: PBS node attribute 'state' value: 'free' (r: NULL)

09/27 12:05:06 INFO: PBS node attribute 'license' value: 'l' (r: NULL)

09/27 12:05:06 MUMAGetIndex(GRes,l,ADD)

09/27 12:05:06 INFO: adding MAList[GRes][1]: 'l'

09/27 12:05:06 INFO: PBS node attribute 'pcpus' value: '2' (r: NULL)

09/27 12:05:06 INFO: PBS node attribute 'properties' value: 'node3' (r:
NULL)

09/27 12:05:06 MUGetMAttr(Feature,node3,ADD,16)

09/27 12:05:06 INFO: added MAList[Feature][1]: 'node3'

09/27 12:05:06 INFO: PBS node attribute 'resources_available' value:
'linux' (r:
arch)

09/27 12:05:06 MUMAGetIndex(Arch,linux,ADD)

09/27 12:05:06 INFO: adding MAList[Arch][1]: 'linux'

09/27 12:05:06 INFO: PBS node attribute 'resources_available' value:
'1035664kb' (r: mem)

09/27 12:05:06 INFO: PBS node attribute 'resources_available' value: '2' (r:
ncpus)

09/27 12:05:06 INFO: PBS node attribute 'resources_assigned' value: '0kb' (r:
mem)

09/27 12:05:06 INFO: PBS node attribute 'resources_assigned' value: '0' (r:
ncpus)

09/27 12:05:06 INFO: PBS node attribute 'queue' value: 'default' (r: NULL)

09/27 12:05:06 INFO: PBS node attribute 'resv_enable' value: 'True' (r:
NULL)

09/27 12:05:06 MUMAGetIndex(Opsys,DEFAULT,ADD)

09/27 12:05:06 INFO: adding MAList[Opsys][1]: 'DEFAULT'

09/27 12:05:06 MUMAGetBM(Network,DEFAULT,3)

09/27 12:05:06 INFO: adding MAList[Network][1]: 'DEFAULT'

09/27 12:05:06 INFO: MNode[000] ' blackforest3' Idle VM: 1011 Mem: 1011 Dk: 1
Cl: [NONE] [node3]

09/27 12:05:06 INFO: MNode[000] ' blackforest3' C/A/D procs: 2/2/-1

09/27 12:05:06 MRMNodePostLoad(blackforest3)

09/27 12:05:06 MNodeLoadConfig(blackforest3,NULL)

09/27 12:05:06
MCfgGetSVal(Buf,CurPtr,NODECFG,blackforest3,Index,Value,SymTable)

09/27 12:05:06
MCfgGetVal(Buf,NODECFG,blackforest3,Index,Value,1024,SymTable)

09/27 12:05:06 MCPRestore(NODE,blackforest3,Optr)

09/27 12:05:06 MNodeLoadCP(NP,NODE blackforest3 1127809549 <node
STATACTIVETIME="0" STATTOTALTIME="884539" STATUPTIME="884539"></node>)

09/27 12:05:06 MUGetIndex(STATACTIVETIME,ValList,0)

09/27 12:05:06 MUGetIndex(STATTOTALTIME,ValList,0)

09/27 12:05:06 MUGetIndex(STATUPTIME,ValList,0)

09/27 12:05:06 MNodeUpdateResExpression(blackforest3)

09/27 12:05:06 MNodeGetLocation(blackforest3)

09/27 12:05:06 INFO: RMName 'blackforest3' set for node[01][01]
'blackforest3'

09/27 12:05:06 INFO: node slot not set on node 'blackforest3'

09/27 12:05:06 INFO: node 'blackforest3' assigned to location F:1/S:1

09/27 12:05:06 MNodeShow(blackforest3)

[000] blackforest3: (P:2,S:1011,M:1011,D:1) [Idle][DEFAULT][linux]<0.000000>
C:[NONE][DEFAULT] [node3] [NONE]

 I even manually tried to assign nodes a specific partition

NODECFG[blackforest3] PARTITION=default (different than DEFAULT)

It did not work.

 This is driving me crazy. Please email me if you have faced such a
condition before or if you need more info on this.

Thank you.

----------------------------------------------------------
Mohammad Shafiullah
Research Systems Assistant
Blackforest Cluster Computing Project
University of Nebraska at Omaha
----------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20050927/7f360849/attachment-0001.html


More information about the mauiusers mailing list