[torquedev] pbs_demux

Prakash Velayutham velayups at email.uc.edu
Fri Mar 3 08:29:53 MST 2006

Hi All,

This question is regarding a multi-node MPI kind of job. After the job 
is scheduled and sent over to the MS, MS first does a JOIN_JOB request 
to all the other nodes in the exec_host list. The nodes respond with an 
ALL_OKAY message and send the event_com as JOIN_JOB. After MS receives 
ALL_OKAY from all the sister nodes, I can see that MS goes through the 
processes of TMomFinalizeJob1, TMomFinalizeJob2, TMomFinalizeChild, 
TMomFinalizeJob3 routines. In the TMomFinalizeChild routine MS starts up 
a pbs_demux process also in addition to the job task. But what I don't 
seem to understand is where exactly in this sequence does MS tell the 
other nodes too to start the job. Could someone explain please?


More information about the torquedev mailing list