[torquedev] pbs_demux
Prakash Velayutham
velayups at email.uc.edu
Fri Mar 3 08:29:53 MST 2006
Hi All,
This question is regarding a multi-node MPI kind of job. After the job
is scheduled and sent over to the MS, MS first does a JOIN_JOB request
to all the other nodes in the exec_host list. The nodes respond with an
ALL_OKAY message and send the event_com as JOIN_JOB. After MS receives
ALL_OKAY from all the sister nodes, I can see that MS goes through the
processes of TMomFinalizeJob1, TMomFinalizeJob2, TMomFinalizeChild,
TMomFinalizeJob3 routines. In the TMomFinalizeChild routine MS starts up
a pbs_demux process also in addition to the job task. But what I don't
seem to understand is where exactly in this sequence does MS tell the
other nodes too to start the job. Could someone explain please?
Thanks,
Prakash
More information about the torquedev
mailing list