[Mauiusers] Job is always queued in PBS+maui setup
Mohammad Shafiullah
shafirocks at gmail.com
Tue Sep 27 12:49:45 MDT 2005
Hello,
I am trying out the new PBSpro_7.0.0-solaris_sparc and maui3.2.6p13 on a
SunOS 5.10 Generic_118822-11 sun4u sparc SUNW, Sun-Fire-V240 box.
Installation of PBS went smoothly and here's my PBS configuration:
qmgr
Max open servers: 4
Qmgr: print server
#
# Create queues and set their attributes.
#
#
# Create and define queue masternode
#
create queue masternode
set queue masternode queue_type = Execution
set queue masternode enabled = True
set queue masternode started = True
#
# Create and define queue workq
#
create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default enabled = True
set queue default started = True
#
# Create and define queue GIGOgene
#
create queue GIGOgene
set queue GIGOgene queue_type = Execution
set queue GIGOgene enabled = True
set queue GIGOgene started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = *.gds.unomaha.edu
set server acl_user_enable = True
set server acl_users = nazo
set server acl_users += root at bfmaster.gds.unomaha.edu
set server acl_roots = root at bfmaster.gds.unomaha.edu
set server managers = root at bfmaster.gds.unomaha.edu
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server scheduler_iteration = 600
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
I have 20 nodes I am allocating for job submissions and they belong to a
certain
Queue defined above.
When compiling maui I changed the following in the Makefile:
export OSCCFLAGS=-m64 -g
export OSLDFLAGS=-m64 -lsocket -lnsl –lresolv
The problem I encounter is when I specify a pbs defined queue to run a job
in.
Maui for some reason is assigning all the nodes it's own built-in DEFAULT
queue which
Is not recognized by PBS and the job never runs and stays in queue.
Here's portion of the maui.log that I think is relevant:
(e.g. It's machine 3 we are talking about)
09/27 12:05:06 __MPBSGetNodeState(Name,State,PNode)
09/27 12:05:06 INFO: PBS node blackforest3 set to state Idle (free)
09/27 12:05:06 MNodeFind(blackforest3,N)
09/27 12:05:06 MNodeAdd(blackforest3,N)
09/27 12:05:06 MNodeFind(blackforest3,N)
09/27 12:05:06 MRMNodePreLoad(blackforest3,Idle,BFMASTER)
09/27 12:05:06 MPBSNodeLoad(blackforest3,blackforest3,Idle,BFMASTER)
09/27 12:05:06 INFO: PBS node attribute 'Host' value: 'blackforest3' (r:
NULL)
09/27 12:05:06 INFO: PBS node attribute 'Port' value: '15002' (r: NULL)
09/27 12:05:06 INFO: PBS node attribute 'ntype' value: 'cluster' (r: NULL)
09/27 12:05:06 INFO: PBS node attribute 'state' value: 'free' (r: NULL)
09/27 12:05:06 INFO: PBS node attribute 'license' value: 'l' (r: NULL)
09/27 12:05:06 MUMAGetIndex(GRes,l,ADD)
09/27 12:05:06 INFO: adding MAList[GRes][1]: 'l'
09/27 12:05:06 INFO: PBS node attribute 'pcpus' value: '2' (r: NULL)
09/27 12:05:06 INFO: PBS node attribute 'properties' value: 'node3' (r:
NULL)
09/27 12:05:06 MUGetMAttr(Feature,node3,ADD,16)
09/27 12:05:06 INFO: added MAList[Feature][1]: 'node3'
09/27 12:05:06 INFO: PBS node attribute 'resources_available' value:
'linux' (r:
arch)
09/27 12:05:06 MUMAGetIndex(Arch,linux,ADD)
09/27 12:05:06 INFO: adding MAList[Arch][1]: 'linux'
09/27 12:05:06 INFO: PBS node attribute 'resources_available' value:
'1035664kb' (r: mem)
09/27 12:05:06 INFO: PBS node attribute 'resources_available' value: '2' (r:
ncpus)
09/27 12:05:06 INFO: PBS node attribute 'resources_assigned' value: '0kb' (r:
mem)
09/27 12:05:06 INFO: PBS node attribute 'resources_assigned' value: '0' (r:
ncpus)
09/27 12:05:06 INFO: PBS node attribute 'queue' value: 'default' (r: NULL)
09/27 12:05:06 INFO: PBS node attribute 'resv_enable' value: 'True' (r:
NULL)
09/27 12:05:06 MUMAGetIndex(Opsys,DEFAULT,ADD)
09/27 12:05:06 INFO: adding MAList[Opsys][1]: 'DEFAULT'
09/27 12:05:06 MUMAGetBM(Network,DEFAULT,3)
09/27 12:05:06 INFO: adding MAList[Network][1]: 'DEFAULT'
09/27 12:05:06 INFO: MNode[000] ' blackforest3' Idle VM: 1011 Mem: 1011 Dk: 1
Cl: [NONE] [node3]
09/27 12:05:06 INFO: MNode[000] ' blackforest3' C/A/D procs: 2/2/-1
09/27 12:05:06 MRMNodePostLoad(blackforest3)
09/27 12:05:06 MNodeLoadConfig(blackforest3,NULL)
09/27 12:05:06
MCfgGetSVal(Buf,CurPtr,NODECFG,blackforest3,Index,Value,SymTable)
09/27 12:05:06
MCfgGetVal(Buf,NODECFG,blackforest3,Index,Value,1024,SymTable)
09/27 12:05:06 MCPRestore(NODE,blackforest3,Optr)
09/27 12:05:06 MNodeLoadCP(NP,NODE blackforest3 1127809549 <node
STATACTIVETIME="0" STATTOTALTIME="884539" STATUPTIME="884539"></node>)
09/27 12:05:06 MUGetIndex(STATACTIVETIME,ValList,0)
09/27 12:05:06 MUGetIndex(STATTOTALTIME,ValList,0)
09/27 12:05:06 MUGetIndex(STATUPTIME,ValList,0)
09/27 12:05:06 MNodeUpdateResExpression(blackforest3)
09/27 12:05:06 MNodeGetLocation(blackforest3)
09/27 12:05:06 INFO: RMName 'blackforest3' set for node[01][01]
'blackforest3'
09/27 12:05:06 INFO: node slot not set on node 'blackforest3'
09/27 12:05:06 INFO: node 'blackforest3' assigned to location F:1/S:1
09/27 12:05:06 MNodeShow(blackforest3)
[000] blackforest3: (P:2,S:1011,M:1011,D:1) [Idle][DEFAULT][linux]<0.000000>
C:[NONE][DEFAULT] [node3] [NONE]
I even manually tried to assign nodes a specific partition
NODECFG[blackforest3] PARTITION=default (different than DEFAULT)
It did not work.
This is driving me crazy. Please email me if you have faced such a
condition before or if you need more info on this.
Thank you.
----------------------------------------------------------
Mohammad Shafiullah
Research Systems Assistant
Blackforest Cluster Computing Project
University of Nebraska at Omaha
----------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20050927/7f360849/attachment-0001.html
More information about the mauiusers
mailing list