[torqueusers] epilogue.parallel and prologue.parallel scripts don't run on the mother superior (Torque 2.3.6)
Jerry Smith
jdsmit at sandia.gov
Tue Jan 26 10:38:52 MST 2010
To my recollection it has always been this way, if it is a bug or not, I
don't know.
We get around it by calling {prologue,epilogue}.parallel in the
prologue/epilogue.
Not all jobs use the mother superior as a "compute" sister, there are
flags in mpiexec that specifically leave the controlling mom out, ie
-nolocal
-nolocal (not MPICH/P4)
Do not run any MPI processes on the local compute node. In
a batch job, one of the machines allocated
to run a parallel job will run the batch script and thus
invoke mpiexec. Normally it participates in
running the parallel appliacition, but this option
disables that for special situations where that
node is needed for other processing.
We have a few users that do this, to reduce "chatter" from a node that
has a bit more load ( the controlling mom ), or to use the rank 0 as a
launching point for monitoring of their job.
--Jerry
Lech Nieroda wrote:
> Dear list,
>
> I've run some tests with MPI jobs and various prologue/epilogue
> scripts and have noticed a strange behaviour: the
> prologue.parallel/epilogue.parallel scripts are not invoked on the
> "Mother Superior" node at all. Even after setting the log variables to
> the most verbose level ($logevent 511,$loglevel 7) no error message
> appears.
>
> According to the Appendix G in the manual (
> http://www.clusterresources.com/products/torque/docs/a.gprologueepilogue.shtml
> ) the "Mother Superior" is also a "Sister" and should thus run the
> parallel scripts...
> Is this a bug or is it intended?
>
> Regards,
> Lech
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
More information about the torqueusers
mailing list