[torqueusers] pbs_sched only few jobs running

Frederick Kramer kramer at ikf.uni-frankfurt.de
Mon Sep 6 02:28:22 MDT 2010


Thanks, Glen, Ken and  Stéphan for your replies,

Ken, I should have mentioned that all our jobs only use one CPU core. So there should be as many jobs as available CPUs.

Glen, here's the scheduler config. I didn't change anything in the default values:

round_robin: False	all
by_queue: True		prime
by_queue: True		non_prime
strict_fifo: false	ALL
fair_share: false	ALL
load_balancing: false	ALL
sort_by: shortest_job_first	ALL
log_filter: 256
dedicated_prefix: ded
max_starve: 24:00:00
half_life: 24:00:00
unknown_shares: 10
sync_time: 1:00:00

What means backfilling?

Stéphan, here's the server config:

create queue alice_1h
set queue alice_1h queue_type = Execution
set queue alice_1h Priority = 120
set queue alice_1h max_running = 10
set queue alice_1h resources_max.walltime = 01:00:00
set queue alice_1h resources_default.nodes = 1
set queue alice_1h resources_default.walltime = 01:00:00
set queue alice_1h enabled = True
set queue alice_1h started = True

create queue alice_highstat
set queue alice_highstat queue_type = Execution
set queue alice_highstat Priority = 20
set queue alice_highstat max_running = 8
set queue alice_highstat resources_max.walltime = 72:00:00
set queue alice_highstat resources_default.nodes = 1
set queue alice_highstat resources_default.walltime = 72:00:00
set queue alice_highstat enabled = True
set queue alice_highstat started = True

create queue alice
set queue alice queue_type = Execution
set queue alice Priority = 60
set queue alice max_running = 99
set queue alice resources_max.walltime = 72:00:00
set queue alice resources_default.nodes = 1
set queue alice resources_default.walltime = 72:00:00
set queue alice enabled = True
set queue alice started = True

create queue ikf
set queue ikf queue_type = Execution
set queue ikf Priority = 90
set queue ikf max_running = 5
set queue ikf resources_max.walltime = 72:00:00
set queue ikf resources_default.nodes = 1
set queue ikf resources_default.walltime = 72:00:00
set queue ikf enabled = True
set queue ikf started = True

set server scheduling = True
set server acl_hosts = clstrmstr
set server managers = kramer at compile.new
set server operators = kramer at compile.new
set server default_queue = alice
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 13719


Thanks a lot,
Frederick



On Sep 3, 2010, at 6:29 PM, Glen Beane wrote:

> On Fri, Sep 3, 2010 at 5:43 AM, Frederick Kramer
> <kramer at ikf.uni-frankfurt.de> wrote:
>> Hi there,
>> 
>> we have a small cluster set up with around 20 CPU cores.
>> Currently we are facing the following problem: The queues are filled with a few hundred jobs but only 8 are running. pbsnodes says that the nodes are free.
>> 
>> How can I find out what's wrong?
>> Or is this a common problem?
> 
> 
> what is your pbs_sched configuration?  I am mostly curious about the
> "strict_fifo" and "help_starving_jobs" option
> 
> 
> This is basically a FIFO scheduler, you may want to switch to Maui,
> which has backfilling capabilities.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers




==========================
Frederick Kramer
Institut für Kernphysik, IKF
Goethe-Universität
Max-von-Laue-Str. 1
D-60438 Frankfurt am Main
Tel.: +49-69-798-47061
==========================



More information about the torqueusers mailing list