[torqueusers] LAM/MPI + Torque

シピオニ ロベルト SCIPIONI.Roberto at nims.go.jp
Fri Jun 29 23:28:18 MDT 2007


Interesting !
How do you set LAM/MPI in compilation to do that ?
Is it with 

-with-boot=tm 


or something ?


Roberto


> On Thu, 2007-06-28 at 17:41 -0700, SCIPIONI Roberto wrote:
> > As far as I understand you need to tell Torque to boot the LAM 
> properly> inside the script
> 
> This is not necessary if you use mpiexec.  In our setting I configured
> LAM/MPI (our current version is lam-7.1.3) to use the resource manager
> interface.  This is an option to the LAM/MPI configure script when you
> compile _LAM_.  Then LAM will talk to torque directly to get the 
> nodes,boot LAM and also distribute the jobs to the nodes directly 
> (no need for
> rsh/ssh.)
> 
> OLD command (note you need to manually specify number of nodes):
> 
> mpiexec -machinefile $PBS_NODEFILE -n ?? [your script]
> 
> NEW command:
> 
> mpiexec -boot [your script]
> 
> 
> Apart from being simple it has two other big advantages.
> 
> * You get meaningful usage information from qstat etc,  without 
> this a
> running MPI job will appear to use no CPU.
> 
> * All processes stay under control of the queue system.  Manually
> running LAM (OLD command) with rsh/ssh tends to leave orphaned LAM
> daemons on nodes which have to be manually killed by the system
> administrator logging into each node checking the daemon is unused and
> then killing it.  I used to do this about once a week.  
> Fortunately the
> pestat command is useful for detecting orphaned daemons as it 
> lists the
> number of job processes on each node and if you have more 
> processes than
> jobs currently running on a node then you usually have an orphan.
> 
> 
> 
> Currently we allow users to use either method, but I am only teaching
> the new command to new users.
> 
> One point is that you need to use the mpiexec program that comes with
> LAM.  There is another mpiexec program
> (http://www.osc.edu/~pw/mpiexec/index.php) which provides similar
> functionality for other MPI implementations but doesn't work with
> LAM/MPI.
> 
> Cheers
> Justin
> 
> -- 
> Dr Justin Finnerty
> Rm W3-1-218         Ph 49 (441) 798 3726
> Carl von Ossietzky Universität Oldenburg
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 


More information about the torqueusers mailing list