[torqueusers] pbsdsh implementation bug?

Garrick Staples garrick at usc.edu
Fri Jan 4 16:35:15 MST 2008

On Fri, Jan 04, 2008 at 04:45:19PM +0000, Marcin Mogielnicki alleged:
> Hi all,
> The case is that aux/JOBID entries does not work when used with 'pbsdsh -h'.
> I'm trying to use pbsdsh -h option and I discovered that pbdsh names 
> from aux/JOBID are not used. What pbsdsh does is grepping the name out 
> of 'uname -a' output for every node and comparing it to -h argument. So 
> implication is that hostname must be exactly the same as label given in 
> torque nodes file while it is often not true. Well, if I want to execute 
> something on the second node assigned for example I definitely expect 
> the second entry from aux/JOBID file to work...
> It doesn't look like big problem, but in fact it sometimes is. I 
> encountered bunch of commercial applications requiring defining command 
> for executing anything on remote nodes. For example 'rsh -l %U %H' 
> should be substituted by 'pbsdsh -h %H' in one specific case. 
> Application takes node names from pbs. Crash is guaranteed here.
> It looks like programmer's shortcut for me, as proper long operations on 
> bunch of structures, needed to catch pbs defined node name, were 
> replaced by few lines only based on mostly right assumption. Mostly - 
> but not always. What I'm interested in is confirming if it is considered 
> buggy (i.e. design flaw) behaviour at all. Are there any plans to fix it?

Yes, it was a quick solution.  Looking at the uname output was the only way
that pbsdsh could see the hostname since it is never actually passed the

The real solution is entirely non-trivial requiring TM protocol changes.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080104/b4347f6b/attachment.bin

More information about the torqueusers mailing list