[torquedev] torque 2.4 and OSC's mpiexec

Josh Bernstein jbernstein at penguincomputing.com
Tue Nov 24 22:24:24 MST 2009


What about patching mpiexec?

-Josh

On Nov 24, 2009, at 9:07 PM, "Glen Beane" <glen.beane at gmail.com> wrote:

> I discovered a problem with OSC's mpiexec and torque 2.4.x (and  
> trunk).
>
> see bug #34:
> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=34
>
> It appears that mpiexec is unable to get the exec_host attribute from
> pbs_statjob:
>
> mpiexec: Error: get_hosts: pbs_statjob did not return "exec_host"  
> info.
>
>
> There are only a few changes in src/server/req_stat.c (where the stat
> job request is handled by the server) between 2.3 and 2.4, and I tried
> removing some of those changes to see if that was where the bug was
> introduced, but so far I have been unable to successfully locate the
> cause. Anyone else have any ideas? I could be looking in completely
> the wrong place - I haven't spent a ton of time on this.
>
> I use mpiexec with --comm=none as part of a wrapper script that tricks
> an application that uses rsh to start remote processes into using
> mpiexec (you can specify the "rsh" to use with an env variable).
> Until this problem is fixed, this script will not work in any TORQUE
> after 2.3.x.  I suspect others are in similar situations, and I think
> this is a critical bug.
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev


More information about the torquedev mailing list