[torqueusers] Considerations for Clusters Running Lots of Small Jobs

John S. Urban urbanjost at comcast.net
Thu Dec 4 19:59:37 MST 2008


A simple approach based on experience .. combine or experiment

I have been involved with diverse job mixes, leaning towards large jobs. In 
our environment we  always encouraged users
with many small jobs to combine their tasks into loops or multiple 
executions in a single job; but  partly because our
environment has relatively time-consuming epilogue and prologue steps 
required to set up and clean up the job environment.

Even so, we have had  users with sets of 36,000 jobs per analysis than only 
ran a few seconds each that for various reasons could not be
combined into larger jobs. That's probably at least close to what you're 
talking about? We experimented with many setups to maximize throughput for 
this class of job, as is our habit. In our
case we found that if we "lied" and set NP (the number of processors 
supposedly on the node) to 20 for the 4 CPU nodes and 7 for the 2-CPU nodes 
that were used for
this queue (many small jobs submitted as rapidly as possible)
and scheduled using a constant number of jobs per node instead of by node 
load that we maximized both our utilization and thruput for these
jobs. But our experience has been that the rule is -- if you can't combine 
small jobs, experiment;
whether with PBS/TORQUE, LSF, NQS, .... scheduling very diverse job mixes or 
many small jobs
is much more demanding than scheduling parallel jobs using hundreds+ CPUs 
for weeks or very homogeneous loads,
as far as I'm concerned. And experiment again if anything significant 
changes regarding your platform (a new cluster, new I/O servers, many nodes 
added, ...) Many issues (I/O load, whether large files are transferred at 
job termination, time spent per
epilogue/prologue execution, number of CPUs per box, whether boxes are SMP 
or not, available cache, memory size
of jobs, number of lines of input per job., do you have NUMA control, ...) 
are large components of small job scheduling yet can often be ignored when 
optimizing for large long-running  floating-point-intensive jobs.

In general, it's usually good to let small jobs fight each other a bit for 
resources; but not the big jobs. Let the little jobs
swarm around like ants; but don't put a lot of bulls in the china shop. 
Little jobs require you to schedule more like
a system kernel does. The exceptions can be for such extremes as little jobs 
that take all the memory, or jobs that do a
lot of  I/O over NFS (running a lot at the same time from many nodes is 
probably just going to overload your NFS server if you don't have a large 
parallel I/O box).

In a case where the jobs were REALLY short but we had a million of them we 
just had a script running that did qrun commands in a loop as
soon as the jobs were sensed on a queue that was actually closed as far as 
the scheduler was concerned (if the queue
is on the scheduler wastes a lot of time trying to schedule the jobs for no 
reason). If you leave the scheduler running
if should probably be FIFO so it doesn't waste time thinking too hard about 
it.

Hope there's a nugget or two in this for you. My personal belief is there is 
no simple answer to your question, or the world would need nothing  but a 
FIFO scheduler and one queue.
Lucky are those for which that's enough.


----- Original Message ----- 
From: "Joshua Bernstein" <jbernstein at penguincomputing.com>
To: "torqueusers" <torqueusers at supercluster.org>
Sent: Thursday, December 04, 2008 3:56 PM
Subject: [torqueusers] Considerations for Clusters Running Lots of Small 
Jobs


> Hi All,
>
> The TORQUE documentation contains a nice explanation for running TORQUE on 
> a large cluster. But are these ideas also pertinent to say a very small, 
> say four node cluster, running, say many thousands of short lived jobs. 
> Its very common in the BioIT space to have a comparitively small cluster, 
> but with the many thousands of jobs lasting only a few seconds. Does 
> anybody have an guidance on configuration or even source level changes for 
> a high throughput, small cluster, with short lived jobs. Or would we 
> expect the same changes for a large cluster to also be applicable to this 
> configuration?
>
> -Joshua Bernstein
> Software Engineer
> Penguin Computing
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers 



More information about the torqueusers mailing list