[torqueusers] Torque 4 with OSC mpiexec problem (still?)

Stephen Cousins steve.cousins at maine.edu
Fri Sep 27 14:13:19 MDT 2013


Hi Brock,

Since your email I've been testing this out with MVAPICH2 2.0a (I know,
another group but: how stable is 2.0a?) and so far it works fine without
setting any environment variables. mpiexec and mpirun are symbolic links to
mpiexec.hydra.

Thanks!

Steve


On Tue, Sep 24, 2013 at 2:36 PM, Brock Palen <brockp at umich.edu> wrote:

> Don't have details in front of me, but we get around this by nolonger
> using OSC mpiexec, and manually build mpiexec.hydra (part of mpich)
>
> We then manually set:
> export HYDRA_LAUNCHER=pbs
> export HYDRA_RMK=pbs
>
> This works with mpich/mvapich/intel-mpi.
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> brockp at umich.edu
> (734)936-1985
>
>
>
> On Sep 24, 2013, at 1:48 PM, Stephen Cousins <steve.cousins at maine.edu>
> wrote:
>
> > I found this message from March:
> >
> > http://www.supercluster.org/pipermail/torqueusers/2013-March/015807.html
> >
> > about problems with Torque 4 and OSC mpiexec. Also another one from 2012
> indicating a Torque bug:
> >
> > http://www.supercluster.org/pipermail/torqueusers/2012-July/014884.html
> >
> > I am running Torque 4.2.4.1 after upgrading due to the recent Torque
> security problem. Now I see that our MVAPICH2 jobs that use the OSC mpiexec
> program don't always start well and they never stop correctly. The program
> stops but the job stays queued until qdel'd or walltime runs out.
> >
> > I have started a case with Adaptive Computing but have yet to hear
> anything so I wondered if anyone on this list has insight and/or a fix for
> this.
> >
> > Thanks,
> >
> > Steve
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
________________________________________________________________
 Steve Cousins             Supercomputer Engineer/Administrator
 Advanced Computing Group            University of Maine System
 244 Neville Hall (UMS Data Center)              (207) 561-3574
 Orono ME 04469                      steve.cousins at maine.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130927/19d6e707/attachment.html 


More information about the torqueusers mailing list