[torquedev] job arrays?
Troy Baer
troy at osc.edu
Fri Apr 7 15:28:11 MDT 2006
On Fri, 2006-04-07 at 16:36 -0400, Andrew J Caird wrote:
> If I have a job that I want to run with 500 parameters, but I have 100
> computers and 20 other users with limits of 20 nodes per person. So I
> submit my job array of 500 jobs, and they start when and where they can
> within the constraints of the scheduler - to the scheduler it looks like
> 500 jobs. qsub, qstat, qdel, etc. , though, treat it as one job by
> default, so qdel'ing it kills all of them. There would be an option to
> qstat to get details out of a job array.
This sounds sort of like "job steps" in LoadLeveler and SLURM. To do
this in TORQUE would require some sort of meta-jobid field, along with
teaching qsub and friends how to recognize and handle it. That sounds
potentially messy and invasive...
Alternatively, this could be handled by a "qsub-array" wrapper that
submits a bunch of jobs with a common jobname and a corresponding "qdel-
array" wrapper that would delete them all via judicious use of qselect.
It's not anywhere close to perfect (for instance, a job array wouldn't
show up as a single job in qstat in this scenario), but it might be a
starting point and/or workaround.
> My weak understanding of mpiexec is that it doesn't do this.
>
> Does that make sense? I am struggling with it myself, so any dialog would
> be appreciated.
mpiexec can be made to do something sort of akin to this via the -config
option, but they're always going to be started in lock-step as a unit.
--Troy
--
Troy Baer troy at osc.edu
Science & Technology Support http://www.osc.edu/hpc/
Ohio Supercomputer Center 614-292-9701
More information about the torquedev
mailing list