[torqueusers] Job pipeline software for use with Torque?

Josh Butikofer josh at clusterresources.com
Tue Feb 10 10:18:59 MST 2009


Kevin,

I don't know of any Java layer that does what you are asking for, but, if you'll 
excuse a plug for a Cluster Resources product, Moab does provide a pipeline 
functionality similar to what you are describing, plus does the scheduling of 
TORQUE. It might be worth looking into.

Josh Butikofer
Cluster Resources, Inc.
#############################


Kevin Murphy wrote:
> Hi,
> 
> I'm looking for a layer, preferably in Java, that would sit on top of 
> Torque and facilitate the construction and execution of large pipelines 
> of jobs.  I've written my own system in Perl, but it lacks some features 
> I'd like, and I'd really prefer to be using Java, as I mentioned.  What 
> I'm imagining would come with an abstract job class to enforce useful 
> and required methods; a job manager to instantiate the job objects, 
> automatically build corresponding scripts and execute them; some way of 
> providing configuration to the jobs;  a way to define dependencies 
> between jobs; and a way to report on pipeline progress.
> 
> To give you an idea of what I'm looking for, I have written my own 
> pipeline machinery in object-oriented Perl with an abstract Job class 
> that wraps an invocation of qsub.  I handle job dependencies via 
> Torque's native mechanism, and I have a job manager class that makes it 
> easy to define dependencies between job objects as they are created.  
> When the pipeline is built, a root job is submitted first in a held 
> state, and when the job manager class has finished creating all job 
> objects and qsubbing the corresponding Torque jobs, it releases the hold 
> on the root job.
> 
> I wanted to make it as easy as possible, and in some cases mandatory, 
> for computational tasks to "do the right thing".  The abstract methods 
> of the job class are configure, job_label, 
> verifyRequirementsAtSubmitTime, runtimeOutputsNeedRebuilding, 
> runtimeInputsExist, and execute.  This allows the job manager to ensure 
> that all task prerequisites exist (on all nodes, by default),  and it 
> allows a generic, automatically generated job script to always check 
> inputs and outputs, terminate if the outputs are more recent than the 
> inputs, and of course, to execute.  The generic job script also handles 
> pipeline progress logging without individual job classes needing to 
> worry about that.
> 
> One of the nice things about this pipeline machinery is that it makes it 
> easy to restart a pipeline after one of the job objects has failed in 
> the middle.  You just fix the problem and blindly rerun the pipeline, 
> assuming job classes conscientiously implement the abstract methods.  
> The same set of jobs will be created as in the initial run, but the jobs 
> that have already executed will detect this fact and won't bother 
> invoking their 'execute' methods.  (Jobs are free to force themselves to 
> always execute, of course).
> 
> Here's one thing I'm missing: ideally the job manager and the job 
> classes should have persistent memories, probably in a SQL database.  
> Not only would this help reporting on pipeline progress and historical 
> pipeline runs, but it would allow the jobs to learn from experience and 
> report realistic resource requirements to Torque, which could help 
> Torque to make better scheduling decisions (maybe).
> 
> I'm sure there are other features I'd like, but it's been a while since 
> I messed with this.
> 
> -Kevin Murphy
> 
> P.S.: some code snippets to illustrate the above.
> 
> Excerpt from the master pipeline script that wires job objects together:
> 
> ...
> my $pipeline = Fable::Pipeline::JobManager->new(config=>$fableConfig, 
> std_args => \%std_args, %extraJobManagerOptions)
>  or die "Error creating JobManager object";
> ...
> my $createDatabaseJobName
>  = $pipeline->add(
>                   class=>'CreateDatabaseJob',
>                   depend=>$initialNotifyJobName,
>                  );  # aborts if error
> 
> my $sectionWeightsTableJobName
>  = $pipeline->add(
>                   class=>'SectionWeightsTableJob',
>                   depend=>$createDatabaseJobName,
>                   noInput=>1,
>                  );  # aborts if error
> ...
> $pipeline->run;   # aborts if error
> # Merely submits and returns; pipeline progress is checked with external 
> tools
> ...
> 
> @@@@ Sample job class: @@@@
> 
> package Fable::Pipeline::CreateDatabaseJob;
> use Fable::Pipeline::Job;
> use strict;
> use warnings;
> our @ISA = qw(Fable::Pipeline::Job);
> our $VERSION = '0.01';
> 
> sub job_label {
>    return 'FabCreateDB';
> }
> 
> sub configure {
>  return 1;  # This task has no config of its own
> }
> 
> sub verifyRequirementsAtSubmitTime {
>  return 1;  # Already checked by the Config 
> object                                                                   }
> 
> sub runtimeInputsExist {
>  return 1;  # No inputs, hence trivial 
> success                                                                        
> 
> }
> 
> sub runtimeOutputsNeedRebuilding {
>  my $self = shift;
>  my $dbChecker = Fable::Pipeline::PostgresqlChecker->new(config => 
> $self->config);
>  return !$dbChecker->createdb_database_exists;
> }
> 
> sub execute {
>  my $self = shift;
>  my $dbChecker = Fable::Pipeline::PostgresqlChecker->new(config => 
> $self->config);
>  $dbChecker->create_createdb;
>  return 1;
> }
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list