[Mauiusers] Assigning Nodes to specific nodes

Jonathan Smale JRS221 at bham.ac.uk
Tue Oct 18 09:24:55 MDT 2011

Dear Maui & Torque users,

Following on from my previous email, where Brian Mendenhall was good enough to help, my post did have a couple of typos.  I am trying to create separate queues for the three different types of nodes in our cluster that the user can specify to submit to.  Submitting to the default queue works fine but submitting to the specific queues results in the jobs being queued indefinitely. The problem appears to be shown on the last line of the 'checkjob -v' command:

cannot select job 1015 for partition DEFAULT (Class)

the same checkjob command shows the class to be firstgen, which has the following settings according to the "qmgr -c 'p s'" command:

# Create and define queue firstgen
create queue firstgen
set queue firstgen queue_type = Execution
set queue firstgen Priority = 100
set queue firstgen acl_host_enable = False
set queue firstgen acl_hosts = che-hydra+localhost
set queue firstgen resources_default.neednodes = firstgennodes
set queue firstgen resources_default.nodes = 1
set queue firstgen enabled = True
set queue firstgen started = True
# Set server attributes.
set server scheduling = True
set server acl_host_enable = False
set server acl_hosts = che-hydra.bham.ac.uk
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server auto_node_np = True
set server next_job_number = 1022

The important line being the "set queue firstgen resources_default.neednodes = firstgennodes" which requires a paticular property of the node.  Both the pbsnodes command and the TORQUE_HOME/server_priv/nodes file show that there are nodes available with this property:

     state = free
     np = 4
     properties = firstgennodes
     ntype = cluster
     status = opsys=linux,uname=Linux compute-0-1.local 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64,sessions=? 15201,nsessions=? 15201,nusers=0,idletime=22288885,totmem=9195716kb,availmem=8914764kb,physmem=8175600kb,ncpus=4,loadave=0.00,netload=260038400155,state=free,jobs=,varattr=,rectime=1318951067

compute-0-1 np=4 firstgennodes

Both qstat -f and tracejob show no errors, just that it is queued.  In my maui.cfg file this is also specified:

NODECFG[compute-0-0] SPEED=1 MAXJOB=4 nodetype=firstgennodes
CLASSCFG[firstgen] hostlist = compute-0-0,compute-0-1,compute-0-2,compute-0-3

The only thing that looks wrong to me is the results of the diagnose command.  When running diagnose -j on a job submitted to the firstgen queue I receive the following:

[root at che-hydra home]# diagnose -j 1015
Name                  State Par Proc QOS     WCLimit R  Min     User    Group  Account  QueuedTime  Network  Opsys   Arch    Mem   Disk  Procs       Class Features

1015                   Idle ALL    1 DEF 99:23:59:59 0    1   jsmale    worth        -    00:39:35   [NONE] [NONE] [NONE]    >=0    >=0    NC0 [firstgen:1] [firstgennodes]

Which shows 2 class features, I may be misreading this output but it doesn't look right to me.  What is really worrying is that the diagnose -Q command doesn't show the firstgen queue at all:

diagnose -Q
QOS Status

System QOS Settings:  QList: DEFAULT:0 (Def: DEFAULT)  Flags: 0

Name                * Priority QTWeight QTTarget XFWeight XFTarget     QFlags   JobFlags Limits

DEFAULT                      0        0        0        0     0.00     [NONE]     [NONE] [NONE]
[ALL]                        0        0        0        0     0.00     [NONE]     [NONE] [NONE]

Any help would be appreciated.

Jonathan Smale
Postgraduate Research Student
School of Chemistry

