[torqueusers] Torque 4 with OSC mpiexec problem (still?)

Brock Palen brockp at umich.edu
Tue Sep 24 12:36:10 MDT 2013


Don't have details in front of me, but we get around this by nolonger using OSC mpiexec, and manually build mpiexec.hydra (part of mpich) 

We then manually set:
export HYDRA_LAUNCHER=pbs
export HYDRA_RMK=pbs

This works with mpich/mvapich/intel-mpi.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
brockp at umich.edu
(734)936-1985



On Sep 24, 2013, at 1:48 PM, Stephen Cousins <steve.cousins at maine.edu> wrote:

> I found this message from March:
> 
> http://www.supercluster.org/pipermail/torqueusers/2013-March/015807.html
> 
> about problems with Torque 4 and OSC mpiexec. Also another one from 2012 indicating a Torque bug:
> 
> http://www.supercluster.org/pipermail/torqueusers/2012-July/014884.html
> 
> I am running Torque 4.2.4.1 after upgrading due to the recent Torque security problem. Now I see that our MVAPICH2 jobs that use the OSC mpiexec program don't always start well and they never stop correctly. The program stops but the job stays queued until qdel'd or walltime runs out.
> 
> I have started a case with Adaptive Computing but have yet to hear anything so I wondered if anyone on this list has insight and/or a fix for this.
> 
> Thanks,
> 
> Steve
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list