[torquedev] Re: [torqueusers] is there an epilogue.parallel script?

Michael Gutteridge mgutteri at fhcrc.org
Wed Mar 15 09:50:19 MST 2006


On Tue, 2006-03-14 at 17:46 -0800, Garrick Staples wrote:
> On Tue, Mar 14, 2006 at 01:35:59PM -0800, Garrick Staples alleged:
> > > > Anyone NOT agree with making this change for both parallel scripts?
> > > > 

> > I say we make parallel scripts run on all nodes, add the "prerun"
> > script, and make sure the scripts can identify which node they are
> > running on ($PBS_NODENUM == 0 on MS)
> 
> Replying to myself as usual, 

'cause you're too bloody fast! 8-)

> here's a patch that does the above.  It
> adds prologue.prerun, adds epilogue.parallel, adds
> epilogue.user.parallel (we forgot about that one), has MS run all
> parallel scripts on MS, and adds $PBS_NODENUM to all pelog scripts.

I thought that the pelog script environment was really bare (i.e. root's
environment) and that this sort of information (i.e. node number) would
be passed in as an argument.

May just be some lag in the documentation, though.  Appendix G, section
1 indicates that "For all scripts, the environment passed to the script
is empty."  I don't know the pros/cons of doing arguments vs.
environment variables.

> 
> For job launch and exiting, note that MS' parallel scripts run _after_
> the sisters'.

Makes good sense.

> 
> So the steps from above are now:
> 1. data stagins
> 2. pbs_server sends job script and job struct to MS
> 3. MS runs prologue.prerun
> 4. MS sends JOIN_JOB messages to sisters
> 5. sisters run prologue.parallel
> 6. sisters run prologue.user.parallel
> 7. sisters report back to MS
> 8. MS runs prologue.parallel
> 9. MS runs prologue.user.parallel
> 10. MS runs prologue
> 11. MS runs prologue.user
> 12. MS runs the job script.
> 
> And similarly on the way back, MS runs epilogue.precancel, sisters run
> parallel epilogues, MS runs parallel epilogues, MS runs regular
> epilogues, and finally data stageout.
> 

Should be good.  Thanks for all your fine work...

Michael

> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060315/9c15077d/attachment-0001.bin


More information about the torquedev mailing list