[torqueusers] pbs_sched only few jobs running

Frederick Kramer kramer at ikf.uni-frankfurt.de
Thu Sep 16 03:30:07 MDT 2010


Hi again, sorry to bother you once more with this.
But still only 8 jobs are running even though we have around 20 free nodes with no other load..

Does anybody have an idea?

Thanks & best regards
Frederick



On Sep 6, 2010, at 10:28 AM, Frederick Kramer wrote:

> Thanks, Glen, Ken and  Stéphan for your replies,
> 
> Ken, I should have mentioned that all our jobs only use one CPU core. So there should be as many jobs as available CPUs.
> 
> Glen, here's the scheduler config. I didn't change anything in the default values:
> 
> round_robin: False	all
> by_queue: True		prime
> by_queue: True		non_prime
> strict_fifo: false	ALL
> fair_share: false	ALL
> load_balancing: false	ALL
> sort_by: shortest_job_first	ALL
> log_filter: 256
> dedicated_prefix: ded
> max_starve: 24:00:00
> half_life: 24:00:00
> unknown_shares: 10
> sync_time: 1:00:00
> 
> What means backfilling?
> 
> Stéphan, here's the server config:
> 
> create queue alice_1h
> set queue alice_1h queue_type = Execution
> set queue alice_1h Priority = 120
> set queue alice_1h max_running = 10
> set queue alice_1h resources_max.walltime = 01:00:00
> set queue alice_1h resources_default.nodes = 1
> set queue alice_1h resources_default.walltime = 01:00:00
> set queue alice_1h enabled = True
> set queue alice_1h started = True
> 
> create queue alice_highstat
> set queue alice_highstat queue_type = Execution
> set queue alice_highstat Priority = 20
> set queue alice_highstat max_running = 8
> set queue alice_highstat resources_max.walltime = 72:00:00
> set queue alice_highstat resources_default.nodes = 1
> set queue alice_highstat resources_default.walltime = 72:00:00
> set queue alice_highstat enabled = True
> set queue alice_highstat started = True
> 
> create queue alice
> set queue alice queue_type = Execution
> set queue alice Priority = 60
> set queue alice max_running = 99
> set queue alice resources_max.walltime = 72:00:00
> set queue alice resources_default.nodes = 1
> set queue alice resources_default.walltime = 72:00:00
> set queue alice enabled = True
> set queue alice started = True
> 
> create queue ikf
> set queue ikf queue_type = Execution
> set queue ikf Priority = 90
> set queue ikf max_running = 5
> set queue ikf resources_max.walltime = 72:00:00
> set queue ikf resources_default.nodes = 1
> set queue ikf resources_default.walltime = 72:00:00
> set queue ikf enabled = True
> set queue ikf started = True
> 
> set server scheduling = True
> set server acl_hosts = clstrmstr
> set server managers = kramer at compile.new
> set server operators = kramer at compile.new
> set server default_queue = alice
> set server log_events = 511
> set server mail_from = adm
> set server query_other_jobs = True
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server mom_job_sync = True
> set server keep_completed = 300
> set server next_job_number = 13719
> 
> 
> Thanks a lot,
> Frederick
> 
> 
> 
> On Sep 3, 2010, at 6:29 PM, Glen Beane wrote:
> 
>> On Fri, Sep 3, 2010 at 5:43 AM, Frederick Kramer
>> <kramer at ikf.uni-frankfurt.de> wrote:
>>> Hi there,
>>> 
>>> we have a small cluster set up with around 20 CPU cores.
>>> Currently we are facing the following problem: The queues are filled with a few hundred jobs but only 8 are running. pbsnodes says that the nodes are free.
>>> 
>>> How can I find out what's wrong?
>>> Or is this a common problem?
>> 
>> 
>> what is your pbs_sched configuration?  I am mostly curious about the
>> "strict_fifo" and "help_starving_jobs" option
>> 
>> 
>> This is basically a FIFO scheduler, you may want to switch to Maui,
>> which has backfilling capabilities.
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 
> 
> ==========================
> Frederick Kramer
> Institut für Kernphysik, IKF
> Goethe-Universität
> Max-von-Laue-Str. 1
> D-60438 Frankfurt am Main
> Tel.: +49-69-798-47061
> ==========================
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers




==========================
Frederick Kramer
Institut für Kernphysik, IKF
Goethe-Universität
Max-von-Laue-Str. 1
D-60438 Frankfurt am Main
Tel.: +49-69-798-47061
==========================



More information about the torqueusers mailing list