[torqueusers] pbsdsh implementation bug?
garrick at usc.edu
Fri Jan 4 16:35:15 MST 2008
On Fri, Jan 04, 2008 at 04:45:19PM +0000, Marcin Mogielnicki alleged:
> Hi all,
> The case is that aux/JOBID entries does not work when used with 'pbsdsh -h'.
> I'm trying to use pbsdsh -h option and I discovered that pbdsh names
> from aux/JOBID are not used. What pbsdsh does is grepping the name out
> of 'uname -a' output for every node and comparing it to -h argument. So
> implication is that hostname must be exactly the same as label given in
> torque nodes file while it is often not true. Well, if I want to execute
> something on the second node assigned for example I definitely expect
> the second entry from aux/JOBID file to work...
> It doesn't look like big problem, but in fact it sometimes is. I
> encountered bunch of commercial applications requiring defining command
> for executing anything on remote nodes. For example 'rsh -l %U %H'
> should be substituted by 'pbsdsh -h %H' in one specific case.
> Application takes node names from pbs. Crash is guaranteed here.
> It looks like programmer's shortcut for me, as proper long operations on
> bunch of structures, needed to catch pbs defined node name, were
> replaced by few lines only based on mostly right assumption. Mostly -
> but not always. What I'm interested in is confirming if it is considered
> buggy (i.e. design flaw) behaviour at all. Are there any plans to fix it?
Yes, it was a quick solution. Looking at the uname output was the only way
that pbsdsh could see the hostname since it is never actually passed the
The real solution is entirely non-trivial requiring TM protocol changes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080104/b4347f6b/attachment.bin
More information about the torqueusers