[torqueusers] PVM, mpiexec, tm, and pbsdsh

mgutteri at fhcrc.org mgutteri at fhcrc.org
Tue Dec 6 00:05:34 MST 2005


I was reviewing the pvmd3 manpage and saw this:

  The following options are used by the master pvmd when starting slaves 
  and are only of interest to someone writing a hoster.  Don’t just go 
  using them, now.
       -s     Start pvmd in slave mode.  Hostfile cannot be used, five   
              additional parameters  must  be  supplied:  master pvmd 
              index, master IP, master MTU, slave pvmd index, and slave IP.

This suggests to me that perhaps one could use pbsdsh to start the pvmd slaves
this way.  I am uncertain about how to get the parameters that would be
provided to the slave processses, though I'm going to look into this "hoster"
business though...

However, Garrick may well beat me to it if he can clone rsh functionality into
pbsdsh 8-)..


Quoting L.S.Lowe at bham.ac.uk:

> Yep, I've been looking at this for the last couple of days. One of my main
> concerns is to get the accounting right, so that all the work done under
> the PVM daemons gets accounted into the torque job. Normally with PVM, the
> pvmd daemons are disconnected processes spawned by sshd or in.rshd, so
> accounting for PVM work never finds its way into Torque.
> 
> So I've tried the following scenario and it seems to work .... so far!
> In the user job:
> 
>          pbsdsh /path/to/pvmlisten someargs &
>          export PVM_RSH=/path/to/pvmcaller
>          ... start up PVM ...
> 
> The pbsdsh'd pvmlisten script sets up a listener on a high tcp port on
> each worker. When the user job then starts up PVM, PVM uses the PVM_RSH
> command for each added worker, so our pvmcaller calls through to each
> pvmlisten, which starts the pvmd daemon with the no-fork option and the
> arguments that the master pvmd passes on. There's some output from the
> slave pvmd which finds its way back to the master. User PVM processes then
> get hung off the worker's pvmd.
> 
> At the moment my pvmlisten and pvmcaller are bash scripts which make use
> of the netcat/nc command. That's not ideal, but it works as far as I've
> tested it - the hello-world PVM job! This is in testing on a cluster which
> doesn't have rsh/ssh between workers.
> 
> Any comments/gotchas/ideas on a better way?
> 
> Lawrence Lowe.
> -- 
> 
> On Fri, 2 Dec 2005, Garrick Staples wrote:
> 
> > On Fri, Dec 02, 2005 at 06:41:56PM -0800, mgutteri at fhcrc.org alleged:
> > > 
> > > 
> > > I've seen a lot of traffic lately around (basically) tm* functions.  I
> use PVM
> > > heavily 'round here, and am a bit dissatisfied with the integration
> between
> > > torque and pvm (no fault of anyone, just a need we've got here).
> > > 
> > > I got to thinking that there might be a way to work the two together. 
> Has
> > > anyone worked with spawning PVM slaves via pbsdsh?  pvmd does have a
> "manual
> > > startup" method that I thought might have some traction to it. 
> Otherwise,
> > > would patches to PVM integrate the two?  The ssh functionality was
> added-in,
> > > would it be possible to integrate something where one could set "PVM_RSH"
> to
> > > pbsdsh?
> > > 
> > > I am interested in looking into this deeper, though I'm finding scant
> resources
> > > WRT the tm interface outside the manpages in torque.  Are those current
> enough?
> > > They seemed dated.  Are there any other references that would be useful
> in such
> > > a project?  Has anyone else been on this snipe hunt?
> > 
> > Yes, I'm dieing to make an "rsh" clone for TM.  I'm doing it as fast as
> > I can!
> > 
> > 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 






More information about the torqueusers mailing list