[torqueusers] PVM, mpiexec, tm, and pbsdsh
L.S.Lowe at bham.ac.uk
L.S.Lowe at bham.ac.uk
Sat Dec 3 12:35:22 MST 2005
Yep, I've been looking at this for the last couple of days. One of my main
concerns is to get the accounting right, so that all the work done under
the PVM daemons gets accounted into the torque job. Normally with PVM, the
pvmd daemons are disconnected processes spawned by sshd or in.rshd, so
accounting for PVM work never finds its way into Torque.
So I've tried the following scenario and it seems to work .... so far!
In the user job:
pbsdsh /path/to/pvmlisten someargs &
... start up PVM ...
The pbsdsh'd pvmlisten script sets up a listener on a high tcp port on
each worker. When the user job then starts up PVM, PVM uses the PVM_RSH
command for each added worker, so our pvmcaller calls through to each
pvmlisten, which starts the pvmd daemon with the no-fork option and the
arguments that the master pvmd passes on. There's some output from the
slave pvmd which finds its way back to the master. User PVM processes then
get hung off the worker's pvmd.
At the moment my pvmlisten and pvmcaller are bash scripts which make use
of the netcat/nc command. That's not ideal, but it works as far as I've
tested it - the hello-world PVM job! This is in testing on a cluster which
doesn't have rsh/ssh between workers.
Any comments/gotchas/ideas on a better way?
On Fri, 2 Dec 2005, Garrick Staples wrote:
> On Fri, Dec 02, 2005 at 06:41:56PM -0800, mgutteri at fhcrc.org alleged:
> > I've seen a lot of traffic lately around (basically) tm* functions. I use PVM
> > heavily 'round here, and am a bit dissatisfied with the integration between
> > torque and pvm (no fault of anyone, just a need we've got here).
> > I got to thinking that there might be a way to work the two together. Has
> > anyone worked with spawning PVM slaves via pbsdsh? pvmd does have a "manual
> > startup" method that I thought might have some traction to it. Otherwise,
> > would patches to PVM integrate the two? The ssh functionality was added-in,
> > would it be possible to integrate something where one could set "PVM_RSH" to
> > pbsdsh?
> > I am interested in looking into this deeper, though I'm finding scant resources
> > WRT the tm interface outside the manpages in torque. Are those current enough?
> > They seemed dated. Are there any other references that would be useful in such
> > a project? Has anyone else been on this snipe hunt?
> Yes, I'm dieing to make an "rsh" clone for TM. I'm doing it as fast as
> I can!
More information about the torqueusers