[torqueusers] Job pipeline software for use with Torque?
Josh Butikofer
josh at clusterresources.com
Tue Feb 10 10:18:59 MST 2009
Kevin,
I don't know of any Java layer that does what you are asking for, but, if you'll
excuse a plug for a Cluster Resources product, Moab does provide a pipeline
functionality similar to what you are describing, plus does the scheduling of
TORQUE. It might be worth looking into.
Josh Butikofer
Cluster Resources, Inc.
#############################
Kevin Murphy wrote:
> Hi,
>
> I'm looking for a layer, preferably in Java, that would sit on top of
> Torque and facilitate the construction and execution of large pipelines
> of jobs. I've written my own system in Perl, but it lacks some features
> I'd like, and I'd really prefer to be using Java, as I mentioned. What
> I'm imagining would come with an abstract job class to enforce useful
> and required methods; a job manager to instantiate the job objects,
> automatically build corresponding scripts and execute them; some way of
> providing configuration to the jobs; a way to define dependencies
> between jobs; and a way to report on pipeline progress.
>
> To give you an idea of what I'm looking for, I have written my own
> pipeline machinery in object-oriented Perl with an abstract Job class
> that wraps an invocation of qsub. I handle job dependencies via
> Torque's native mechanism, and I have a job manager class that makes it
> easy to define dependencies between job objects as they are created.
> When the pipeline is built, a root job is submitted first in a held
> state, and when the job manager class has finished creating all job
> objects and qsubbing the corresponding Torque jobs, it releases the hold
> on the root job.
>
> I wanted to make it as easy as possible, and in some cases mandatory,
> for computational tasks to "do the right thing". The abstract methods
> of the job class are configure, job_label,
> verifyRequirementsAtSubmitTime, runtimeOutputsNeedRebuilding,
> runtimeInputsExist, and execute. This allows the job manager to ensure
> that all task prerequisites exist (on all nodes, by default), and it
> allows a generic, automatically generated job script to always check
> inputs and outputs, terminate if the outputs are more recent than the
> inputs, and of course, to execute. The generic job script also handles
> pipeline progress logging without individual job classes needing to
> worry about that.
>
> One of the nice things about this pipeline machinery is that it makes it
> easy to restart a pipeline after one of the job objects has failed in
> the middle. You just fix the problem and blindly rerun the pipeline,
> assuming job classes conscientiously implement the abstract methods.
> The same set of jobs will be created as in the initial run, but the jobs
> that have already executed will detect this fact and won't bother
> invoking their 'execute' methods. (Jobs are free to force themselves to
> always execute, of course).
>
> Here's one thing I'm missing: ideally the job manager and the job
> classes should have persistent memories, probably in a SQL database.
> Not only would this help reporting on pipeline progress and historical
> pipeline runs, but it would allow the jobs to learn from experience and
> report realistic resource requirements to Torque, which could help
> Torque to make better scheduling decisions (maybe).
>
> I'm sure there are other features I'd like, but it's been a while since
> I messed with this.
>
> -Kevin Murphy
>
> P.S.: some code snippets to illustrate the above.
>
> Excerpt from the master pipeline script that wires job objects together:
>
> ...
> my $pipeline = Fable::Pipeline::JobManager->new(config=>$fableConfig,
> std_args => \%std_args, %extraJobManagerOptions)
> or die "Error creating JobManager object";
> ...
> my $createDatabaseJobName
> = $pipeline->add(
> class=>'CreateDatabaseJob',
> depend=>$initialNotifyJobName,
> ); # aborts if error
>
> my $sectionWeightsTableJobName
> = $pipeline->add(
> class=>'SectionWeightsTableJob',
> depend=>$createDatabaseJobName,
> noInput=>1,
> ); # aborts if error
> ...
> $pipeline->run; # aborts if error
> # Merely submits and returns; pipeline progress is checked with external
> tools
> ...
>
> @@@@ Sample job class: @@@@
>
> package Fable::Pipeline::CreateDatabaseJob;
> use Fable::Pipeline::Job;
> use strict;
> use warnings;
> our @ISA = qw(Fable::Pipeline::Job);
> our $VERSION = '0.01';
>
> sub job_label {
> return 'FabCreateDB';
> }
>
> sub configure {
> return 1; # This task has no config of its own
> }
>
> sub verifyRequirementsAtSubmitTime {
> return 1; # Already checked by the Config
> object }
>
> sub runtimeInputsExist {
> return 1; # No inputs, hence trivial
> success
>
> }
>
> sub runtimeOutputsNeedRebuilding {
> my $self = shift;
> my $dbChecker = Fable::Pipeline::PostgresqlChecker->new(config =>
> $self->config);
> return !$dbChecker->createdb_database_exists;
> }
>
> sub execute {
> my $self = shift;
> my $dbChecker = Fable::Pipeline::PostgresqlChecker->new(config =>
> $self->config);
> $dbChecker->create_createdb;
> return 1;
> }
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list