[torqueusers] Help with Time-Share

Matt G. Ellis ellismg at gmail.com
Thu Aug 11 12:49:03 MDT 2005


Hello All,

I'm very new to this torque thing, but here's what I want to do:

I have a program that will be submitting many jobs, I would like each
job to run on one machine (and only have one job running on each
machine at once) and keep the rest queued.

I think to do this I need to setup a timesharing system.

I've added :ts to my server_priv/nodes file for each node in my
system.  I've also added:

load_balancing: true     ALL

to my sched_config file.

When I try to submit a job with qsub -I , these two messages show up
in my log file:

08/11/2005 13:30:13;0008; pbs_sched;Job;72835.os-12;Job Deleted
because it would never run
08/11/2005 13:30:13;0040; pbs_sched;Job;72835.os-12;Not enough of the
right type of nodes available

If I remove the :ts from my nodes file, then the job is always sent to
the default node.  However, under that scheme when I submit a large
number of jobs quickly that machine is overloaded and jobs don't seem
to ever be sent to any other nodes.

Here is my queue setup.

Qmgr: print server
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_ping_rate = 300
set server node_check_rate = 150
set server tcp_timeout = 6
set server job_stat_rate = 30


pbsnodes show's all the nodes at ntype=time-shared

Is timesharring even what I want here?  I think that it is, but I
could be wrong.  What more do I need to do to get timesharring to
work?

Thanks for your help!

--Matt


More information about the torqueusers mailing list