[torquedev] pbs_demux

Garrick Staples garrick at usc.edu
Fri Mar 3 17:50:34 MST 2006


On Fri, Mar 03, 2006 at 10:29:53AM -0500, Prakash Velayutham alleged:
> Hi All,
> 
> This question is regarding a multi-node MPI kind of job. After the job 
> is scheduled and sent over to the MS, MS first does a JOIN_JOB request 
> to all the other nodes in the exec_host list. The nodes respond with an 
> ALL_OKAY message and send the event_com as JOIN_JOB. After MS receives 
> ALL_OKAY from all the sister nodes, I can see that MS goes through the 
> processes of TMomFinalizeJob1, TMomFinalizeJob2, TMomFinalizeChild, 
> TMomFinalizeJob3 routines. In the TMomFinalizeChild routine MS starts up 
> a pbs_demux process also in addition to the job task. But what I don't 
> seem to understand is where exactly in this sequence does MS tell the 
> other nodes too to start the job. Could someone explain please?

It doesn't.  Once the sisters have the JOIN_JOB request, they are part
of the job.  Notice that the job list on sisters are always in state
"starting."

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060303/6ad4aa5b/attachment.bin


More information about the torquedev mailing list