[torqueusers] Torque over-limit submission accepted.

Marc Mendez-Bermond marc.mendezbermond at gmail.com
Sun Sep 4 07:08:17 MDT 2011


Hi all,

I am fighting with a *new* installation of Torque coupled with Maui 
where 3 queues are defined and when I try to submit a job to the "small" 
queue with more cores than its max allowed, the job is accepted.

For example, the 'small resources_max.ncpus = 12' and the queue accepts 
'-l nodes=2:ppn=12 -q small' requests ... It looks like the nodes value 
only is considered which is quite confirmed if I try the following :
'-l nodes=14:ppn=12' is being routed to the "medium" queue defined as 
'set queue medium resources_max.ncpus = 64'.

Versions are :
- torque-2.5.7-1.el5.1 (EPEL5 RPMs for RHEL/CENTOS 5)
- maui-3.3-4.el5 (https://svnweb.cern.ch/trac/maui)

Its configuration is detailed below and I think Maui is out of the cause 
as using the pbs_sched will lead to the same issue.

Any help appreciated !

Regards,
M.

======

#
# Create queues and set their attributes.
#
#
# Create and define queue medium
#
create queue medium
set queue medium queue_type = Execution
set queue medium max_queuable = 100
set queue medium resources_max.ncpus = 64
set queue medium resources_max.nodect = 64
set queue medium resources_min.ncpus = 13
set queue medium resources_default.walltime = 48:00:00
set queue medium enabled = True
set queue medium started = True
#
# Create and define queue large
#
create queue large
set queue large queue_type = Execution
set queue large max_queuable = 100
set queue large resources_max.ncpus = 168
set queue large resources_max.nodect = 168
set queue large resources_min.ncpus = 65
set queue large resources_default.walltime = 24:00:00
set queue large enabled = True
set queue large started = True
#
# Create and define queue small
#
create queue small
set queue small queue_type = Execution
set queue small max_queuable = 100
set queue small resources_max.ncpus = 12
set queue small resources_max.nodect = 12
set queue small resources_default.walltime = 96:00:00
set queue small enabled = True
set queue small started = True
#
# Create and define queue portalq
#
create queue portalq
set queue portalq queue_type = Route
set queue portalq route_destinations = small
set queue portalq route_destinations += medium
set queue portalq route_destinations += large
set queue portalq enabled = True
set queue portalq started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = master.mycluster.org
set server managers = root at master.mycluster.org
set server operators = root at master.mycluster.org
set server default_queue = portalq
set server log_events = 511
set server mail_from = adm
set server resources_default.nodect = 1
set server resources_default.walltime = 00:15:00
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server queue_centric_limits = True
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 183


More information about the torqueusers mailing list