[torquedev] Re: [torqueusers] is there an epilogue.parallel script?

Garrick Staples garrick at usc.edu
Tue Mar 14 14:35:59 MST 2006


On Tue, Mar 14, 2006 at 01:17:14PM -0800, Michael Gutteridge alleged:
> On Tue, 2006-03-14 at 12:58 -0800, Garrick Staples wrote:
> > On Tue, Mar 14, 2006 at 12:41:32PM +0100, Bas van der Vlies alleged:
> > > Lennart Karlsson wrote:
> > > >Garrick,
> > > >
> > > >You wrote:
> > > >>Anyone else find that annoying?  I've always thought that the parallel
> > > >>scripts should run on all nodes.
> > > >
> > > I did not see the orginal post. Only this reply
> > > 
> > > >I agree. A parallel script should run on all nodes of the job.
> > > >
> > > I also agree. The first time when i implemented the *.parallel script
> > > i was suprised that it did not run on the master mom.
> > 
> > Anyone NOT agree with making this change for both parallel scripts?
> > 
> 
> I'm not 100% behind it, really.  I think it can be very handy to have
> different scripts set up (and tear-down) the MOM and sisters.  I guess
> the larger issue is what the likely usage scenarios are for the gue
> scripts.
> 
> This also introduces a teeny bit of worry as to which script runs
> first... would actions in the prologue and prologue.parallel overlap,
> possibly conflict? What receives precedence?  What's right for the
> environment?

1. data stageins are directed by pbs_server to MS
2. pbs_server sends job script and job struct to MS
3. MS sends JOIN_JOB messages to sisters
4. sisters run prologue.parallel
5. sisters run prologue.user.parallel
6. sisters report back to MS
7. MS runs prologue
8. MS runs prologue.user
9. MS runs the job script.

I'd also like a "prologue.prerun" that runs on MS after step 2.  This is
analogous to the "epilogue.precancel" that already exists.

 
> I agree that at first read, "prologue.parallel" suggests that it should
> run on all nodes.  However, I'm unconvinced that that is the right
> thing.  Perhaps changing the name? "prologue.sisters"?

I say we make parallel scripts run on all nodes, add the "prerun"
script, and make sure the scripts can identify which node they are
running on ($PBS_NODENUM == 0 on MS)

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060314/896619fd/attachment.bin


More information about the torquedev mailing list