[torqueusers] Torque with MPICH kills jobs consistently,
but OpenPBS works fine
velayups at email.uc.edu
Wed Nov 30 08:50:42 MST 2005
I have a cluster with 18 nodes. I have Torque-1.2.0p5 with MPICH-1.2.7
and mpiexec-0.8 running.
When I run an MPI application, after a random amount of time, I see that
the job gets killed.
The error given is "p0_7360: p4_error: interrupt SIGx: 15"
When I replace Torque with OpenPBS-2.3.16, and everything else remaining
the same, the job goes completes just fine. Of course, I recompiled
mpiexec to use OpenPBS.
Any thoughts please?
More information about the torqueusers