[torqueusers] pbsdsh implementation bug?
pw at osc.edu
Mon Jan 7 08:23:04 MST 2008
mar at pism.pl wrote on Mon, 07 Jan 2008 00:09 +0000:
> > Yes, it was a quick solution. Looking at the uname output was the only
> > that pbsdsh could see the hostname since it is never actually passed the
> > hostlist.
> > The real solution is entirely non-trivial requiring TM protocol changes.
> Hm, what about quick workaround of this workaround? Kind of transform
> hostname switch, like in mpiexec, would to the trick in all cases and seems
> trivial to add. It would be easy to get entry from aux/jobid and make it
> working. I'll add it anyway, but it would be nice to have it into official
> source code.
The problem with pbsdsh is that it uses the "uname name", but
not the "PBS name". If you use FQDNs for your hostnames, then
these likely differ. You could hack pbsdsh to add an argument
"--no-fqdn" to chop the hostname off from the first ".". Or you
could talk to the server or read $PBS_NODEFILE to get the list
of PBS names, like mpiexec does.
I put up some ideas on how to use mpiexec itself as an rsh
replacement. It's not very well thought out, but maybe you or
someone else may find it to be a useful starting point.
First entry here:
More information about the torqueusers