[torqueusers] Multiple concurrent calls of pbsdsh from same
job script
Garrick Staples
garrick at usc.edu
Tue Nov 1 09:08:07 MST 2005
On Tue, Nov 01, 2005 at 04:11:48PM +0200, Martin Schaff??ner alleged:
> I am in the process of discontinuing rsh access to compute nodes and replacing
> it by TM-based spawns. We have a suite of perl helper scripts which split and
> distribute certain tasks to nodes allocated to a job. This is achieved by
> forking and execing a synchronous rsh call (the ones I would like to
> discontinue).
>
> Now I thought of replacing the rsh calls by pbsdsh calls. However, while a
> pbsdsh waits for the spawned process to finish, I cannot spawn more processes
> using another pbsdsh. It returns "pbsdsh: tm_init failed, rc =
> TM_ENOTCONNECTED (17002)" and Mother Superior's log has
> "pbs_mom;Svr;pbs_mom;tm_request, extra TM connect from 3790.cluster task 1".
>
> Does anybody know an easy solution to this problem?
Yes, this is a limitation of the TM interface. Pete Wyckoff's mpiexec
has a pretty neat method to work around this. You can use it to spawn
non-MPI commands with "-comm none".
http://www.osc.edu/~pw/mpiexec/
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051101/333e9d67/attachment.bin
More information about the torqueusers
mailing list