[torqueusers] (no subject)

Efstathiadis, Efstratios stratos at bnl.gov
Wed Mar 15 07:39:52 MST 2006


 
Thanks for the reply,
 
maybe I should explain a little more of what I am trying to do:
My computing resource is a custom supercomputer (qcdoc http://www.bnl.gov/lqcd/comp/) 
that is already partitioned: we have partitions of 2048 nodes, 1024 nodes etc.
Each partition is unique (number of nodes, geometry, etc) and is mapped
to toqrue queue: so jobs submitted to queue p1 will then run on machine partition
p1. There is no sophisticated scheduling here, jobs just run one after the
other (FIFO), only one running job per partition, and the default scheduler
(pbs_sched) is sufficient.
 
We do have though some identical small partitions (64 nodes each) that could be used
for code debugging before allocating larger partitions. The problem is that if I put 
all of them under one queue (debug) I loose the mapping between machine partitions
and torque queues. If I use a separate queue for each partition (short, medium, large based
on walltime) jobs may be queued on the, say, short queue while the other two remain idle.
 
If you have any ideas on what I could map my machine partitions to (other than queues) or
any other way to handle this, please let me know.
 
Thanks again,
Stratos
 
 
 
 

________________________________

From: torqueusers-bounces at supercluster.org on behalf of Garrick Staples
Sent: Tue 3/14/2006 6:13 PM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] (no subject)



On Tue, Mar 14, 2006 at 05:19:29PM -0500, Efstathiadis, Efstratios alleged:
> Hi,
>
>
> being new with MAUI, I can't tell if this is trivial or not:
> I have defined in torque three queues (short, medium, long)
> each with a resources_max.walltime of 1hr, 6hr and 12hr respectivelly,
> and a maximum running jobs of 1 for all three queues.
> I also have a default queue called batch.Submitted jobs get assigned to the
> appropriate queue based on the walltime specified in the qsub command.
> This works well with both pbs_sched and MAUI.

FYI, routing queues have nothing to do with the scheduler, so if your
routing works, it will work with any scheduler.


> Now, if I submit many short jobs, one will be running (as expected) and all the
> others will be queued, waiting for their turn to run.
> In the mean time, the medium and the long queues may be completely idle,
> with no jobs running or queued. How can the scheduler "move" queued
> jobs from the short queue to the empty medium or long queues??
> After all, a short job can also fit into the medium or long queues.

It is doing exactly what you designed: 3 seperate queues with a max of 1
running job per queue.

The scheduler won't move jobs into other queues.  Its job is to
schedule jobs within the constraints and policies that have been
established.

I don't know what your usage requirements are, but it sounds like queues
are the wrong tool for the job.  You might want to look at setting
policy in your maui config.  Or give CRI a call and get some consulting
(they will sell you moab, and it will probably do everything you want.)


> And another (different) question: How can I assign specific host CPUs to queues?
> I know how to assign hosts to queues (with acl_hosts=.., keeping
> acl_host_enable false). The reason is that I have a large IBM
> SP machine with many CPUS and I would like to partition it.

TORQUE doesn't support this.  My only suggestion is to partition the
host OS; then you'd have 2 IPs, 2 pbs_moms, and 2 nodes in your server
config.

And of course, consider not partitioning it.  Fragmenting your cluster
into small pieces kills overall usage by reducing scheduling
opportunities.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California




More information about the torqueusers mailing list