[torqueusers] Toy routing queues not working correctly.

Jeremy Hallum jhallum at umich.edu
Fri Jul 11 06:37:17 MDT 2008

I'm working with some toy routing queues and I don't understand why
these things aren't working.  Here's the specs:

Torque 2.3.0

the output of pbs_server is below:
create queue small
set queue small queue_type = Execution
set queue small Priority = 20
set queue small resources_max.nodes = 1
set queue small resources_min.nodes = 1
set queue small resources_default.nodes = 1
set queue small enabled = True
set queue small started = True
# Create and define queue default
create queue default
set queue default queue_type = Route
set queue default route_destinations = large
set queue default route_destinations += medium
set queue default route_destinations += small
set queue default enabled = True
set queue default started = True
# Create and define queue medium
create queue medium
set queue medium queue_type = Execution
set queue medium Priority = 20
set queue medium resources_max.nodes = 7
set queue medium resources_min.nodes = 2
set queue medium resources_default.nodes = 4
set queue medium enabled = True
set queue medium started = True
# Create and define queue large
create queue large
set queue large queue_type = Execution
set queue large Priority = 20
set queue large resources_max.nodes = 32
set queue large resources_min.nodes = 8
set queue large resources_default.nodes = 10
set queue large enabled = True
set queue large started = True
# Set server attributes.
set server scheduling = True
set server acl_hosts = xxx.xxx.lsa.umich.edu
set server managers = maui at xxx.xxx.lsa.umich.edu
set server managers += root at xxx.xxx.lsa.umich.edu
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server log_level = 0
set server queue_centric_limits = True
set server next_job_number = 507

As you can see, a really basic routing model.  The problem is that when
a job of 4 nodes is submitted, it gets dropped right in the first queue,
large, rather than getting dropped down to the medium queue, where it
should be.  

If I disable and stop the first queue, it skips to the small queue,
rather than using the medium Execution queue. 

I've tried using tracejob and increasing the log_level (to 7) at the
server level to determine what logic the server is using to place the
jobs, but that's not enough, the best info I get is:

07/11/2008 08:04:42  S    enqueuing into default, state 1 hop 1
07/11/2008 08:04:42  S    dequeuing from default, state QUEUED
07/11/2008 08:04:42  S    enqueuing into large, state 1 hop 1

Has anyone else seen a problem like this? What other steps can I take to
try to diagnose the problem?  I've tried:

recreating the entire pbs_server database. pbs_server -t create
flipping the order of the queues (the order is always first queue gets
the job).
At first I used Maui, I switched to pbs_sched later on and it still
isn't working right, which is why I suspect a setting in qmgr is the

Thanks for any help you can give, and let me know if you need more info.


Jeremy Hallum
System Adminstrator, Research Systems Group
LSA Information Technology
University of Michigan
jhallum at umich.edu

More information about the torqueusers mailing list