[Mauiusers] [torqueusers] Jobs going into incorrect queue

Greenseid, Joseph M (IS) Joseph.Greenseid at ngc.com
Wed Apr 22 11:41:02 MDT 2009


does it fail if you submit a 300 hour job directly to the short queue?
 
--Joe

________________________________

From: mauiusers-bounces at supercluster.org on behalf of Philip Peartree
Sent: Wed 4/22/2009 1:19 PM
To: Steve Young
Cc: torqueusers at supercluster.org; mauiusers at supercluster.org
Subject: Re: [Mauiusers] [torqueusers] Jobs going into incorrect queue



The reasoning behind the long time limit, is that some software we use 
is notoriously unpredictable, and therefore, it's best to give a 
longish time, knowing that most will complete quickly, but some can 
last nearly those 2 weeks.


Quoting Steve Young <chemadm at hamilton.edu>:

> Hi Phillip,
>       Ah I see... yea first glance it looks like it *should* work =). I'm 
> using routing queue's but they aren't based on walltime so not sure 
> if I have any good suggestions. The routing queue's I have setup 
> work as expected. What happens when you try submitting a job to each 
> of the execution queue's? I'd think you should get rejected on the 
> short_2h?
>
> My point before was to understand why you'd want to let them default 
> to a large amount of time instead of making it smaller so it 
> finishes quick and they figure out they need to put in a proper 
> walltime. If I queue up something that takes a month to run but 
> forget to put in walltime I wouldn't know for two weeks. Then when 
> it was killed off by the system I'd have to start again with the 
> proper walltime thus taking a month to get back to where I was when 
> it ended prematurely. Anyhow, hope this helps.
>
> -Steve
>
>
> On Apr 22, 2009, at 9:16 AM, Philip Peartree wrote:
>
>> Steve, you seem to have miss understood, I have a default walltime
>> set, at 2 weeks (336 hours), and therefore the job should go into the
>> unspec queue, but instead, it is going to the short_2h queue, where it
>> shouldn't be able to run (since the max queue walltime 2h)
>>
>> I have included the full output of print server:
>>
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue short_2h
>> #
>> create queue short_2h
>> set queue short_2h queue_type = Execution
>> set queue short_2h Priority = 50
>> set queue short_2h resources_max.walltime = 02:00:00
>> set queue short_2h acl_group_enable = True
>> set queue short_2h acl_groups = nmrc
>> set queue short_2h enabled = True
>> set queue short_2h started = True
>> #
>> # Create and define queue guest
>> #
>> create queue guest
>> set queue guest queue_type = Execution
>> set queue guest Priority = 10
>> set queue guest enabled = True
>> set queue guest started = True
>> #
>> # Create and define queue long_1w
>> #
>> create queue long_1w
>> set queue long_1w queue_type = Execution
>> set queue long_1w Priority = 30
>> set queue long_1w resources_max.walltime = 168:00:00
>> set queue long_1w acl_group_enable = True
>> set queue long_1w acl_groups = nmrc
>> set queue long_1w enabled = True
>> set queue long_1w started = True
>> #
>> # Create and define queue med_12h
>> #
>> create queue med_12h
>> set queue med_12h queue_type = Execution
>> set queue med_12h Priority = 40
>> set queue med_12h resources_max.walltime = 12:00:00
>> set queue med_12h acl_group_enable = True
>> set queue med_12h acl_groups = nmrc
>> set queue med_12h enabled = True
>> set queue med_12h started = True
>> #
>> # Create and define queue route
>> #
>> create queue route
>> set queue route queue_type = Route
>> set queue route route_destinations = short_2h
>> set queue route route_destinations += med_12h
>> set queue route route_destinations += long_1w
>> set queue route route_destinations += unspec
>> set queue route route_destinations += guest
>> set queue route enabled = True
>> set queue route started = True
>> #
>> # Create and define queue unspec
>> #
>> create queue unspec
>> set queue unspec queue_type = Execution
>> set queue unspec Priority = 20
>> set queue unspec acl_group_enable = True
>> set queue unspec acl_groups = nmrc
>> set queue unspec enabled = True
>> set queue unspec started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_hosts = steel
>> set server managers = root at steel.mib.man.ac.uk
>> set server operators = root at steel.mib.man.ac.uk
>> set server default_queue = route
>> set server log_events = 511
>> set server mail_from = adm
>> set server query_other_jobs = True
>> set server resources_default.walltime = 336:00:00
>> set server scheduler_iteration = 600
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server queue_centric_limits = True
>> set server mom_job_sync = True
>> set server keep_completed = 300
>> set server next_job_number = 9066
>>
>>
>> Thanks
>>
>> Phil
>>
>>
>> Quoting Steve Young <chemadm at hamilton.edu>:
>>
>>> Hi,
>>>     I use a server default for torque.....
>>>
>>> set server resources_default.walltime = 24:00:00
>>>
>>> This way if they don't specify anything they will default to 24
>>> hours.  I took the approach that if the user doesn't specify anything
>>> that they should get a minimal amount of queue time. With this I don't
>>> have to have a queue to handle unspecified. I'd rather have their job
>>> finish fairly quick and realize they didn't specify a time than to
>>> have them go for days/weeks before they realized they didn't specify
>>> it. I'd hate to have a job run for two weeks and then end up getting
>>> killed off because I didn't specify my time. Especially for a job that
>>> can't pick up where it left off and has to start from the beginning
>>> again. Seems like a waste of resources to me. Not sure if this helps
>>> you any. Could you send the output of the rest of the qmgr output?
>>> It's hard to tell why it's getting to the unspec queue if we can't see
>>> the config for it.
>>>
>>> -Steve
>>>
>>>
>>>
>>> On Apr 21, 2009, at 1:06 PM, Philip Peartree wrote:
>>>
>>>> The default queue is the routing queue, which should place the job
>>>> based on allowed time, that is why it's so puzzling that the jobs end
>>>> up in the short_2h queue, as they should be rejected by that and
>>>> others until it reaches the unspec queue.
>>>>
>>>>
>>>> Quoting "Greenseid, Joseph M (IS)" <Joseph.Greenseid at ngc.com>:
>>>>
>>>>> have you tried to set the default queue (set server default_queue =
>>>>> unspec) in qmgr?  this is how i route jobs that don't specify
>>>>> resources to a default location...
>>>>>
>>>>> --Joe
>>>>>
>>>>> ________________________________
>>>>>
>>>>> From: mauiusers-bounces at supercluster.org on behalf of Philip Peartree
>>>>> Sent: Tue 4/21/2009 12:32 PM
>>>>> To: torqueusers at supercluster.org; mauiusers at supercluster.org
>>>>> Subject: [Mauiusers] Jobs going into incorrect queue
>>>>>
>>>>>
>>>>>
>>>>> Hi Guys
>>>>>
>>>>> I have a problem that jobs appear to be not routing to the correct
>>>>> queue. My set up is as follows:
>>>>>
>>>>> routing queue
>>>>> 2h queue
>>>>> 12h queue
>>>>> 1w queue
>>>>> unspecified time queue (max time 2w)
>>>>> guest queue (low priority)
>>>>>
>>>>> If a time is unspecified at job submission a default time of 2w
>>>>> (336h) is set
>>>>>
>>>>> The routing queue is setup as follows (as taken from qmgr -c 'print
>>>>> server')
>>>>>
>>>>> create queue route
>>>>> set queue route queue_type = Route
>>>>> set queue route route_destinations = short_2h
>>>>> set queue route route_destinations += med_12h
>>>>> set queue route route_destinations += long_1w
>>>>> set queue route route_destinations += unspec
>>>>> set queue route route_destinations += guest
>>>>> set queue route enabled = True
>>>>> set queue route started = True
>>>>>
>>>>> my problem is that some jobs with unspecified time (which have
>>>>> correctly been given a time of 336h) are ending up in the short_2h
>>>>> queue, which has a higher priority than other queues. Does anyone
>>>>> know
>>>>> of any possible explanation for this?
>>>>>
>>>>> Phil Peartree
>>>>> University of Manchester
>>>>>
>>>>> _______________________________________________
>>>>> mauiusers mailing list
>>>>> mauiusers at supercluster.org
>>>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> mauiusers mailing list
>>>> mauiusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>>
>>
>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>



_______________________________________________
mauiusers mailing list
mauiusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20090422/f9a7346d/attachment-0001.html 


More information about the mauiusers mailing list