[torqueusers] Multiple concurrent calls of pbsdsh from same job script

Garrick Staples garrick at usc.edu
Tue Nov 1 09:08:07 MST 2005


On Tue, Nov 01, 2005 at 04:11:48PM +0200, Martin Schaff??ner alleged:
> I am in the process of discontinuing rsh access to compute nodes and replacing 
> it by TM-based spawns. We have a suite of perl helper scripts which split and 
> distribute certain tasks to nodes allocated to a job. This is achieved by 
> forking and execing a synchronous rsh call (the ones I would like to 
> discontinue).
> 
> Now I thought of replacing the rsh calls by pbsdsh calls. However, while a 
> pbsdsh waits for the spawned process to finish, I cannot spawn more processes 
> using another pbsdsh. It returns "pbsdsh: tm_init failed, rc = 
> TM_ENOTCONNECTED (17002)" and Mother Superior's log has 
> "pbs_mom;Svr;pbs_mom;tm_request, extra TM connect from 3790.cluster task 1". 
> 
> Does anybody know an easy solution to this problem?

Yes, this is a limitation of the TM interface.  Pete Wyckoff's mpiexec
has a pretty neat method to work around this.  You can use it to spawn
non-MPI commands with "-comm none".

http://www.osc.edu/~pw/mpiexec/


-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051101/333e9d67/attachment.bin


More information about the torqueusers mailing list