[torquedev] torque 2.4 and OSC's mpiexec

Glen Beane glen.beane at gmail.com
Tue Nov 24 22:07:06 MST 2009


I discovered a problem with OSC's mpiexec and torque 2.4.x (and trunk).

see bug #34:
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=34

It appears that mpiexec is unable to get the exec_host attribute from
pbs_statjob:

mpiexec: Error: get_hosts: pbs_statjob did not return "exec_host" info.


There are only a few changes in src/server/req_stat.c (where the stat
job request is handled by the server) between 2.3 and 2.4, and I tried
removing some of those changes to see if that was where the bug was
introduced, but so far I have been unable to successfully locate the
cause. Anyone else have any ideas? I could be looking in completely
the wrong place - I haven't spent a ton of time on this.

I use mpiexec with --comm=none as part of a wrapper script that tricks
an application that uses rsh to start remote processes into using
mpiexec (you can specify the "rsh" to use with an env variable).
Until this problem is fixed, this script will not work in any TORQUE
after 2.3.x.  I suspect others are in similar situations, and I think
this is a critical bug.


More information about the torquedev mailing list