[torqueusers] Torque with MPICH kills jobs consistently, but OpenPBS works fine

Prakash Velayutham velayups at email.uc.edu
Wed Nov 30 08:50:42 MST 2005


Hi All,

I have a cluster with 18 nodes. I have Torque-1.2.0p5 with MPICH-1.2.7 
and mpiexec-0.8 running.
When I run an MPI application, after a random amount of time, I see that 
the job gets killed.
The error given is "p0_7360:  p4_error: interrupt SIGx: 15"

When I replace Torque with OpenPBS-2.3.16, and everything else remaining 
the same, the job goes completes just fine. Of course, I recompiled 
mpiexec to use OpenPBS.

Any thoughts please?

Thanks,
Prakash


More information about the torqueusers mailing list