[torqueusers] Torque Ignores PIDS of MPI processes

Garrick Staples garrick at usc.edu
Mon Jul 23 22:02:02 MDT 2007


On Mon, Jul 23, 2007 at 11:22:01AM +0000, Joshua Bernstein alleged:
> Hello All,
> 
>     I'm having an issue where it appears that TORQUE, when attempting 
> to cancel an MPI job, only seems to kill only two out of the six 
> processes that are running. For example I have a job script as follows:
> 
> --SNIP--
> #PBS -N cpuhog
> #PBS -j oe
> echo "MAP: $BEOWULF_JOB_MAP"
> cd $PBS_O_WORKDIR
> /usr/bin/mpirun -np 4 ./cpuhog01 256 0:10:00
> echo "DONE"
> --END SNIP--
> 
> This ends up in 6 processes being started by TORQUE's pbs_mom. Two of 
> them are "bash", and the four others are the "cpuhog01" processes 
> started by mpirun. Now when I attempt to kill or cancel this job, TORQUE 
> successfully kills the two "bash" processes but keeps around the four 
> other "cpuhog01" processes.

TORQUE can kill processes that are either part of the same process group
as the batch script, or are direct children of any MOM in the job.

Any other processes can not be known to be part of a job.

 
> I'm aware of mpiexec for starting jobs under TORQUE, but I'm wondering 
> how exactly mpiexec is able to tell TORQUE about these extra processes 
> and why TORQUE isn't aware of them on its own. I know there is this TM 
> Interface. Is this documented anywhere? Why isn't TORQUE able to do this 
> on its own.

'man tm' :)

"task manager" spawns remote processes through "inter-mom" procedure
calls.  It can also signal, get exit notifications through obituary
notices, and a few other things.

For tm_spawn(), the local MOM tells a sister MOM to spawn a process; and
since the new process is a direct child of the sister, it is known to be
part of the job.

 
> In my mind, a job is really a shell script that gets started by a 
> pbs_mom. If I call mpirun from inside that job, which then forks new 
> processes, TORQUE should be able to track which PIDs a particular job 
> has spawned by looking at the PPID to PID relationship.

It does.  However, sometimes grandparent processes are killed before
grandchild processes are killed which break the PPID relationship
because 'init' reaps the grandchildren.  The scripts/jobs really should
be doing the "right thing" to clean up the kids.

 
> I'd really like to avoid having to use mpiexec, as out mpirun is already 
> optimized for starting jobs on remote nodes without the use of rsh/ssh, 
> instead it takes advantage of BProc. Our version of MPICH uses bproc to 
> fork processes on the remote nodes. It therefore doesn't require rsh or ssh.

mpiexec, using TM, doesn't use rsh/ssh.

pbs_mom doesn't have any direct support for BProc.

 
> Any insight as to how TORQUE becomes aware of what PIDS to kill when a 
> Job is killed would be helpful as well as any other discussion along 
> these lines.
> 
> It might be worth mentioning this is Linux x86_64/CentOS 4.5. Thanks!

When running under PBS, TM is really the best way to go.

I hope that answers your questions.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070723/f5e20405/attachment.bin


More information about the torqueusers mailing list