[torqueusers] setting Resources_min while queue has running jobs

David Beer dbeer at adaptivecomputing.com
Wed Nov 13 11:41:16 MST 2013


On Tue, Nov 12, 2013 at 8:16 PM, Andrew Mather <mathera at gmail.com> wrote:

> Hi All
>
>
> I'd like to modify one of our queues by using resources_min to enforce
> a minimum requirement for a specific single queue on our cluster.  I'd
> like to use this parameter to force all jobs sent to this queue to
> 'ask for' 2 CPUs.
>
>
Setting a resources_min doesn't handle this by itself. resources_min and
resources_max are used to filter jobs among queues and resources_default is
used to apply defaults.


> The thing I am not sure about is what will happen to jobs already
> queued and more particularly, currently running, if they've requested
> only 1.  Will the running ones be killed off for violating the minimum
> requirements and will the queued ones simply be held forever ?
>
>
The running jobs will not be killed. I don't believe that it will change
the jobs that are already queued as these limits and defaults are applied
at the time of queuing the job.


> Is it safe to do this while these jobs exist, or should I stop the
> queue and allow those jobs to drain before making this type of change
> ?  There's currently a thousand or so jobs queued or running via this
> queue, some of which are hundreds of hours into their 1500hour
> walltime run, so I don't want to kill them off !
>
>
Obviously what you have described is the safest option, but I think it is
not required.


> Also, once this change is made, would a specific request for 1 CPU in
> the submission script override this value ?
>
>
If you only use resources_min, then yes. You need to use a combination of
resources_min and resources_default.


> The reason for the change is that this particular queue is currently
> sending a large number of small, CPU intensive jobs onto our nodes,
> which currently have hyperthreading enabled, which is causing the
> machines to bog down and performance drops right off.  This is likely
> to be a long-term state of affairs due to the nature of some of the
> current projects using the cluster.
>
> In general, we get sufficient benefit from the hyperthreading that
> we'd prefer to leave it on cluster-wide if we could.
>
> Since all the problem jobs are coming down one particular queue, I
> figured that if we could tweak the levers of this queue, we wouldn't
> need to mess with the rest, which on the whole is working fine.
>
> Thanks for any help you can provide and see you in Denver next week !
>
>
If you want to force all jobs to request at least 2 cpus, perhaps the
easiest way to accomplish this is to 1) instruct all users to do so and 2)
create a submit filter that would outright reject these jobs. You can also
do it using resources_min and resources_default, but you need to remember
to set the min to reject these jobs in all queues or they'll just get
routed wherever you forgot to set it.

David


>
> --
> -
> http://surfcoast.redbubble.com |
> https://picasaweb.google.com/107747436224613508618
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> "Unless someone like you, cares a whole awful lot, nothing is going to
> get better...It's not !" - The Lorax
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> A committee is a cul-de-sac, down which ideas are lured and then
> quietly strangled.
>   Sir Barnett Cocks
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> "A mind is like a parachute. It doesnt work if it's not open." :- Frank
> Zappa
> -
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131113/0913070c/attachment.html 


More information about the torqueusers mailing list