On Tue, Mar 14, 2006 at 05:46:57PM -0800, Garrick Staples alleged:
> > I say we make parallel scripts run on all nodes, add the "prerun"
> > script, and make sure the scripts can identify which node they are
> > running on ($PBS_NODENUM == 0 on MS)
> Replying to myself as usual, here's a patch that does the above.  It
> adds prologue.prerun, adds epilogue.parallel, adds
> epilogue.user.parallel (we forgot about that one), has MS run all
> parallel scripts on MS, and adds $PBS_NODENUM to all pelog scripts.
> For job launch and exiting, note that MS' parallel scripts run _after_
> the sisters'.

Turns out, I'm not finding these changes to be all that useful.  The
lack of $PBS_NODEFILE on sisters and during prologue.prerun, and that
prologue.parallel has no way of knowing the hostname of MS makes these
worthless for my purposes.

To make these useful for *me*, we'd need to add a $PBS_MSHOST for
parallel scripts and create $PBS_NODEFILE much earlier in the process.
But at the end of the day, it doesn't really get me anything more than I
currently have.

I know multiple people have asked for epilogue.parallel, so that will go
in.  epilogue.user.parallel is documented, so it should go in.

But I have some questions... 

Is having parallel scripts executed on MS actually useful to anyone?  Or
is this non-backwards compatible change just a "makes sense to me"
thing?  You could easily duplicate it by having prologue run

Would parallel.prerun actually be useful to anyone?  Noone has asked for
it, so I'm inclined to drop that idea.

How are parallel and user scripts currently used?  I can't come with any
good reasons for them (without the other changes I mentioned above.)

