[torqueusers] pbs_sched only few jobs running
Frederick Kramer
kramer at ikf.uni-frankfurt.de
Mon Sep 6 02:28:22 MDT 2010
Thanks, Glen, Ken and Stéphan for your replies,
Ken, I should have mentioned that all our jobs only use one CPU core. So there should be as many jobs as available CPUs.
Glen, here's the scheduler config. I didn't change anything in the default values:
round_robin: False all
by_queue: True prime
by_queue: True non_prime
strict_fifo: false ALL
fair_share: false ALL
load_balancing: false ALL
sort_by: shortest_job_first ALL
log_filter: 256
dedicated_prefix: ded
max_starve: 24:00:00
half_life: 24:00:00
unknown_shares: 10
sync_time: 1:00:00
What means backfilling?
Stéphan, here's the server config:
create queue alice_1h
set queue alice_1h queue_type = Execution
set queue alice_1h Priority = 120
set queue alice_1h max_running = 10
set queue alice_1h resources_max.walltime = 01:00:00
set queue alice_1h resources_default.nodes = 1
set queue alice_1h resources_default.walltime = 01:00:00
set queue alice_1h enabled = True
set queue alice_1h started = True
create queue alice_highstat
set queue alice_highstat queue_type = Execution
set queue alice_highstat Priority = 20
set queue alice_highstat max_running = 8
set queue alice_highstat resources_max.walltime = 72:00:00
set queue alice_highstat resources_default.nodes = 1
set queue alice_highstat resources_default.walltime = 72:00:00
set queue alice_highstat enabled = True
set queue alice_highstat started = True
create queue alice
set queue alice queue_type = Execution
set queue alice Priority = 60
set queue alice max_running = 99
set queue alice resources_max.walltime = 72:00:00
set queue alice resources_default.nodes = 1
set queue alice resources_default.walltime = 72:00:00
set queue alice enabled = True
set queue alice started = True
create queue ikf
set queue ikf queue_type = Execution
set queue ikf Priority = 90
set queue ikf max_running = 5
set queue ikf resources_max.walltime = 72:00:00
set queue ikf resources_default.nodes = 1
set queue ikf resources_default.walltime = 72:00:00
set queue ikf enabled = True
set queue ikf started = True
set server scheduling = True
set server acl_hosts = clstrmstr
set server managers = kramer at compile.new
set server operators = kramer at compile.new
set server default_queue = alice
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 13719
Thanks a lot,
Frederick
On Sep 3, 2010, at 6:29 PM, Glen Beane wrote:
> On Fri, Sep 3, 2010 at 5:43 AM, Frederick Kramer
> <kramer at ikf.uni-frankfurt.de> wrote:
>> Hi there,
>>
>> we have a small cluster set up with around 20 CPU cores.
>> Currently we are facing the following problem: The queues are filled with a few hundred jobs but only 8 are running. pbsnodes says that the nodes are free.
>>
>> How can I find out what's wrong?
>> Or is this a common problem?
>
>
> what is your pbs_sched configuration? I am mostly curious about the
> "strict_fifo" and "help_starving_jobs" option
>
>
> This is basically a FIFO scheduler, you may want to switch to Maui,
> which has backfilling capabilities.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
==========================
Frederick Kramer
Institut für Kernphysik, IKF
Goethe-Universität
Max-von-Laue-Str. 1
D-60438 Frankfurt am Main
Tel.: +49-69-798-47061
==========================
More information about the torqueusers
mailing list