[torqueusers] [Mauiusers] Jobs going into incorrect queue

Steve Young chemadm at hamilton.edu
Wed Apr 22 11:59:12 MDT 2009


Hi,
	ok that makes sense =). I didn't mean to imply you were wrong. I  
figured it depended on the applications you use. I was just curious. I  
like to hear how others do things as it gives me idea's for our  
setup ;-).

I would try running jobs directly on each of the execution queue's to  
make sure you get accepted/rejected based on the walltime's in  
question. Then see how the routing queue works. I'm wondering if your  
allowed to run a 2 week job on the short_2h queue directly?

-Steve





On Apr 22, 2009, at 1:19 PM, Philip Peartree wrote:

> The reasoning behind the long time limit, is that some software we  
> use is notoriously unpredictable, and therefore, it's best to give a  
> longish time, knowing that most will complete quickly, but some can  
> last nearly those 2 weeks.
>
>
> Quoting Steve Young <chemadm at hamilton.edu>:
>
>> Hi Phillip,
>> 	Ah I see... yea first glance it looks like it *should* work =).  
>> I'm using routing queue's but they aren't based on walltime so not  
>> sure if I have any good suggestions. The routing queue's I have  
>> setup work as expected. What happens when you try submitting a job  
>> to each of the execution queue's? I'd think you should get rejected  
>> on the short_2h?
>>
>> My point before was to understand why you'd want to let them  
>> default to a large amount of time instead of making it smaller so  
>> it finishes quick and they figure out they need to put in a proper  
>> walltime. If I queue up something that takes a month to run but  
>> forget to put in walltime I wouldn't know for two weeks. Then when  
>> it was killed off by the system I'd have to start again with the  
>> proper walltime thus taking a month to get back to where I was when  
>> it ended prematurely. Anyhow, hope this helps.
>>
>> -Steve
>>
>>
>> On Apr 22, 2009, at 9:16 AM, Philip Peartree wrote:
>>
>>> Steve, you seem to have miss understood, I have a default walltime
>>> set, at 2 weeks (336 hours), and therefore the job should go into  
>>> the
>>> unspec queue, but instead, it is going to the short_2h queue,  
>>> where it
>>> shouldn't be able to run (since the max queue walltime 2h)
>>>
>>> I have included the full output of print server:
>>>
>>> #
>>> # Create queues and set their attributes.
>>> #
>>> #
>>> # Create and define queue short_2h
>>> #
>>> create queue short_2h
>>> set queue short_2h queue_type = Execution
>>> set queue short_2h Priority = 50
>>> set queue short_2h resources_max.walltime = 02:00:00
>>> set queue short_2h acl_group_enable = True
>>> set queue short_2h acl_groups = nmrc
>>> set queue short_2h enabled = True
>>> set queue short_2h started = True
>>> #
>>> # Create and define queue guest
>>> #
>>> create queue guest
>>> set queue guest queue_type = Execution
>>> set queue guest Priority = 10
>>> set queue guest enabled = True
>>> set queue guest started = True
>>> #
>>> # Create and define queue long_1w
>>> #
>>> create queue long_1w
>>> set queue long_1w queue_type = Execution
>>> set queue long_1w Priority = 30
>>> set queue long_1w resources_max.walltime = 168:00:00
>>> set queue long_1w acl_group_enable = True
>>> set queue long_1w acl_groups = nmrc
>>> set queue long_1w enabled = True
>>> set queue long_1w started = True
>>> #
>>> # Create and define queue med_12h
>>> #
>>> create queue med_12h
>>> set queue med_12h queue_type = Execution
>>> set queue med_12h Priority = 40
>>> set queue med_12h resources_max.walltime = 12:00:00
>>> set queue med_12h acl_group_enable = True
>>> set queue med_12h acl_groups = nmrc
>>> set queue med_12h enabled = True
>>> set queue med_12h started = True
>>> #
>>> # Create and define queue route
>>> #
>>> create queue route
>>> set queue route queue_type = Route
>>> set queue route route_destinations = short_2h
>>> set queue route route_destinations += med_12h
>>> set queue route route_destinations += long_1w
>>> set queue route route_destinations += unspec
>>> set queue route route_destinations += guest
>>> set queue route enabled = True
>>> set queue route started = True
>>> #
>>> # Create and define queue unspec
>>> #
>>> create queue unspec
>>> set queue unspec queue_type = Execution
>>> set queue unspec Priority = 20
>>> set queue unspec acl_group_enable = True
>>> set queue unspec acl_groups = nmrc
>>> set queue unspec enabled = True
>>> set queue unspec started = True
>>> #
>>> # Set server attributes.
>>> #
>>> set server scheduling = True
>>> set server acl_hosts = steel
>>> set server managers = root at steel.mib.man.ac.uk
>>> set server operators = root at steel.mib.man.ac.uk
>>> set server default_queue = route
>>> set server log_events = 511
>>> set server mail_from = adm
>>> set server query_other_jobs = True
>>> set server resources_default.walltime = 336:00:00
>>> set server scheduler_iteration = 600
>>> set server node_check_rate = 150
>>> set server tcp_timeout = 6
>>> set server queue_centric_limits = True
>>> set server mom_job_sync = True
>>> set server keep_completed = 300
>>> set server next_job_number = 9066
>>>
>>>
>>> Thanks
>>>
>>> Phil
>>>
>>>
>>> Quoting Steve Young <chemadm at hamilton.edu>:
>>>
>>>> Hi,
>>>> 	I use a server default for torque.....
>>>>
>>>> set server resources_default.walltime = 24:00:00
>>>>
>>>> This way if they don't specify anything they will default to 24
>>>> hours.  I took the approach that if the user doesn't specify  
>>>> anything
>>>> that they should get a minimal amount of queue time. With this I  
>>>> don't
>>>> have to have a queue to handle unspecified. I'd rather have their  
>>>> job
>>>> finish fairly quick and realize they didn't specify a time than to
>>>> have them go for days/weeks before they realized they didn't  
>>>> specify
>>>> it. I'd hate to have a job run for two weeks and then end up  
>>>> getting
>>>> killed off because I didn't specify my time. Especially for a job  
>>>> that
>>>> can't pick up where it left off and has to start from the beginning
>>>> again. Seems like a waste of resources to me. Not sure if this  
>>>> helps
>>>> you any. Could you send the output of the rest of the qmgr output?
>>>> It's hard to tell why it's getting to the unspec queue if we  
>>>> can't see
>>>> the config for it.
>>>>
>>>> -Steve
>>>>
>>>>
>>>>
>>>> On Apr 21, 2009, at 1:06 PM, Philip Peartree wrote:
>>>>
>>>>> The default queue is the routing queue, which should place the job
>>>>> based on allowed time, that is why it's so puzzling that the  
>>>>> jobs end
>>>>> up in the short_2h queue, as they should be rejected by that and
>>>>> others until it reaches the unspec queue.
>>>>>
>>>>>
>>>>> Quoting "Greenseid, Joseph M (IS)" <Joseph.Greenseid at ngc.com>:
>>>>>
>>>>>> have you tried to set the default queue (set server  
>>>>>> default_queue =
>>>>>> unspec) in qmgr?  this is how i route jobs that don't specify
>>>>>> resources to a default location...
>>>>>>
>>>>>> --Joe
>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>> From: mauiusers-bounces at supercluster.org on behalf of Philip  
>>>>>> Peartree
>>>>>> Sent: Tue 4/21/2009 12:32 PM
>>>>>> To: torqueusers at supercluster.org; mauiusers at supercluster.org
>>>>>> Subject: [Mauiusers] Jobs going into incorrect queue
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Guys
>>>>>>
>>>>>> I have a problem that jobs appear to be not routing to the  
>>>>>> correct
>>>>>> queue. My set up is as follows:
>>>>>>
>>>>>> routing queue
>>>>>> 2h queue
>>>>>> 12h queue
>>>>>> 1w queue
>>>>>> unspecified time queue (max time 2w)
>>>>>> guest queue (low priority)
>>>>>>
>>>>>> If a time is unspecified at job submission a default time of 2w
>>>>>> (336h) is set
>>>>>>
>>>>>> The routing queue is setup as follows (as taken from qmgr -c  
>>>>>> 'print
>>>>>> server')
>>>>>>
>>>>>> create queue route
>>>>>> set queue route queue_type = Route
>>>>>> set queue route route_destinations = short_2h
>>>>>> set queue route route_destinations += med_12h
>>>>>> set queue route route_destinations += long_1w
>>>>>> set queue route route_destinations += unspec
>>>>>> set queue route route_destinations += guest
>>>>>> set queue route enabled = True
>>>>>> set queue route started = True
>>>>>>
>>>>>> my problem is that some jobs with unspecified time (which have
>>>>>> correctly been given a time of 336h) are ending up in the  
>>>>>> short_2h
>>>>>> queue, which has a higher priority than other queues. Does anyone
>>>>>> know
>>>>>> of any possible explanation for this?
>>>>>>
>>>>>> Phil Peartree
>>>>>> University of Manchester
>>>>>>
>>>>>> _______________________________________________
>>>>>> mauiusers mailing list
>>>>>> mauiusers at supercluster.org
>>>>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> mauiusers mailing list
>>>>> mauiusers at supercluster.org
>>>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>>
>>>> _______________________________________________
>>>> mauiusers mailing list
>>>> mauiusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>
>
>



More information about the torqueusers mailing list