[torquedev] job arrays?

Garrick Staples garrick at usc.edu
Fri Apr 7 16:09:13 MDT 2006


On Fri, Apr 07, 2006 at 05:28:11PM -0400, Troy Baer alleged:
> On Fri, 2006-04-07 at 16:36 -0400, Andrew J Caird wrote:
> > If I have a job that I want to run with 500 parameters, but I have 100 
> > computers and 20 other users with limits of 20 nodes per person.  So I 
> > submit my job array of 500 jobs, and they start when and where they can 
> > within the constraints of the scheduler - to the scheduler it looks like 
> > 500 jobs.  qsub, qstat, qdel, etc. , though, treat it as one job by 
> > default, so qdel'ing it kills all of them.  There would be an option to 
> > qstat to get details out of a job array.
> 
> This sounds sort of like "job steps" in LoadLeveler and SLURM.  To do
> this in TORQUE would require some sort of meta-jobid field, along with
> teaching qsub and friends how to recognize and handle it.  That sounds
> potentially messy and invasive...
> 
> Alternatively, this could be handled by a "qsub-array" wrapper that
> submits a bunch of jobs with a common jobname and a corresponding "qdel-
> array" wrapper that would delete them all via judicious use of qselect.
> It's not anywhere close to perfect (for instance, a job array wouldn't
> show up as a single job in qstat in this scenario), but it might be a
> starting point and/or workaround.

Right.  Yuck.


> > My weak understanding of mpiexec is that it doesn't do this.
> > 
> > Does that make sense?  I am struggling with it myself, so any dialog would 
> > be appreciated.
> 
> mpiexec can be made to do something sort of akin to this via the -config
> option, but they're always going to be started in lock-step as a unit.

I think mpiexec (or pbsdsh with some scheduling mods), plus dynamically
sized jobs (which dooesn't exist yet), fits the requirements.

Just describe the tasks in one big config file that lists executable,
env vars, and args, and execute a TM launcher.

We could even add a new job resource to report back the number of
completed tasks.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060407/5a2c8724/attachment.bin


More information about the torquedev mailing list