[torqueusers] epilogue.parallel and prologue.parallel scripts don't run on the mother superior (Torque 2.3.6)

Jerry Smith jdsmit at sandia.gov
Tue Jan 26 10:38:52 MST 2010


To my recollection it has always been this way, if it is a bug or not, I 
don't know.
We get around it by calling {prologue,epilogue}.parallel in the 
prologue/epilogue.

Not all jobs use the mother superior as a "compute" sister, there are 
flags in mpiexec that specifically leave the controlling mom out, ie 
-nolocal
       -nolocal (not MPICH/P4)
             Do not run any MPI processes on the local compute node.  In 
a batch job, one of the machines allocated
             to run a parallel job will run the batch script and thus 
invoke mpiexec.  Normally it participates  in
             running  the  parallel  appliacition,  but this option 
disables that for special situations where that
             node is needed for other processing.

We have a few users that do this, to reduce "chatter" from a node that 
has a bit more load ( the controlling mom ), or to use the rank 0 as a 
launching point for monitoring of their job.

--Jerry


Lech Nieroda wrote:
> Dear list,
>
> I've run some tests with MPI jobs and various prologue/epilogue
> scripts and have noticed a strange behaviour: the
> prologue.parallel/epilogue.parallel scripts are not invoked on the
> "Mother Superior" node at all. Even after setting the log variables to
> the most verbose level ($logevent 511,$loglevel 7) no error message
> appears.
>
> According to the Appendix G in the manual (
> http://www.clusterresources.com/products/torque/docs/a.gprologueepilogue.shtml
> ) the "Mother Superior" is also a "Sister" and should thus run the
> parallel scripts...
> Is this a bug or is it intended?
>
> Regards,
> Lech
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>   



More information about the torqueusers mailing list