[torquedev] pbs_demux
Garrick Staples
garrick at usc.edu
Fri Mar 3 17:50:34 MST 2006
On Fri, Mar 03, 2006 at 10:29:53AM -0500, Prakash Velayutham alleged:
> Hi All,
>
> This question is regarding a multi-node MPI kind of job. After the job
> is scheduled and sent over to the MS, MS first does a JOIN_JOB request
> to all the other nodes in the exec_host list. The nodes respond with an
> ALL_OKAY message and send the event_com as JOIN_JOB. After MS receives
> ALL_OKAY from all the sister nodes, I can see that MS goes through the
> processes of TMomFinalizeJob1, TMomFinalizeJob2, TMomFinalizeChild,
> TMomFinalizeJob3 routines. In the TMomFinalizeChild routine MS starts up
> a pbs_demux process also in addition to the job task. But what I don't
> seem to understand is where exactly in this sequence does MS tell the
> other nodes too to start the job. Could someone explain please?
It doesn't. Once the sisters have the JOIN_JOB request, they are part
of the job. Notice that the job list on sisters are always in state
"starting."
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060303/6ad4aa5b/attachment.bin
More information about the torquedev
mailing list