[torquedev] job arrays?

Troy Baer troy at osc.edu
Fri Apr 7 15:28:11 MDT 2006


On Fri, 2006-04-07 at 16:36 -0400, Andrew J Caird wrote:
> If I have a job that I want to run with 500 parameters, but I have 100 
> computers and 20 other users with limits of 20 nodes per person.  So I 
> submit my job array of 500 jobs, and they start when and where they can 
> within the constraints of the scheduler - to the scheduler it looks like 
> 500 jobs.  qsub, qstat, qdel, etc. , though, treat it as one job by 
> default, so qdel'ing it kills all of them.  There would be an option to 
> qstat to get details out of a job array.

This sounds sort of like "job steps" in LoadLeveler and SLURM.  To do
this in TORQUE would require some sort of meta-jobid field, along with
teaching qsub and friends how to recognize and handle it.  That sounds
potentially messy and invasive...

Alternatively, this could be handled by a "qsub-array" wrapper that
submits a bunch of jobs with a common jobname and a corresponding "qdel-
array" wrapper that would delete them all via judicious use of qselect.
It's not anywhere close to perfect (for instance, a job array wouldn't
show up as a single job in qstat in this scenario), but it might be a
starting point and/or workaround.

> My weak understanding of mpiexec is that it doesn't do this.
> 
> Does that make sense?  I am struggling with it myself, so any dialog would 
> be appreciated.

mpiexec can be made to do something sort of akin to this via the -config
option, but they're always going to be started in lock-step as a unit.

	--Troy
-- 
Troy Baer                       troy at osc.edu
Science & Technology Support    http://www.osc.edu/hpc/
Ohio Supercomputer Center       614-292-9701



More information about the torquedev mailing list