[torquedev] torque 2.4 and OSC's mpiexec
jbernstein at penguincomputing.com
Tue Nov 24 22:24:24 MST 2009
What about patching mpiexec?
On Nov 24, 2009, at 9:07 PM, "Glen Beane" <glen.beane at gmail.com> wrote:
> I discovered a problem with OSC's mpiexec and torque 2.4.x (and
> see bug #34:
> It appears that mpiexec is unable to get the exec_host attribute from
> mpiexec: Error: get_hosts: pbs_statjob did not return "exec_host"
> There are only a few changes in src/server/req_stat.c (where the stat
> job request is handled by the server) between 2.3 and 2.4, and I tried
> removing some of those changes to see if that was where the bug was
> introduced, but so far I have been unable to successfully locate the
> cause. Anyone else have any ideas? I could be looking in completely
> the wrong place - I haven't spent a ton of time on this.
> I use mpiexec with --comm=none as part of a wrapper script that tricks
> an application that uses rsh to start remote processes into using
> mpiexec (you can specify the "rsh" to use with an env variable).
> Until this problem is fixed, this script will not work in any TORQUE
> after 2.3.x. I suspect others are in similar situations, and I think
> this is a critical bug.
> torquedev mailing list
> torquedev at supercluster.org
More information about the torquedev