[torqueusers] Torque with MPICH kills jobs consistently,
but OpenPBS works fine
Garrick Staples
garrick at usc.edu
Tue Apr 11 13:56:56 MDT 2006
On Mon, Apr 10, 2006 at 05:23:25PM -0400, Prakash Velayutham alleged:
> Garrick Staples wrote:
> >On Mon, Dec 05, 2005 at 05:12:08PM -0500, Prakash Velayutham alleged:
> >
> >>11/08/2005 11:23:32;0008;PBS_Server;Job;50645.ribosome.cchmc.org;Job Run
> >>at request of Scheduler at ribosome.cchmc.org
> >>
> >>11/08/2005 11:24:48;0100;PBS_Server;Req;;Type JobObituary request
> >>received from pbs_mom at tyrosine.bmicluster1.cchmc.org, sock=9
> >>11/08/2005
> >>
> >Don't see an external job delete...
> >
> >
> >>Here is the mom log:
> >>
> >>11/08/2005 11:22:30;0001; pbs_mom;Job;TMomFinalizeJob3;job
> >>50645.ribosome.cchmc.org started, pid = 2806
> >>11/08/2005 11:22:31;0008;
> >>pbs_mom;Job;50645.ribosome.cchmc.org;start_process: task started, tid 2,
> >>sid 2866, cmd /bin/sh
> >>11/08/2005 11:23:37;0008;
> >>pbs_mom;Job;50645.ribosome.cchmc.org;kill_task: killing pid 2877 task 2
> >>with sig 9
> >>
> >Increase MOM's loglevel over 4, it should log why kill_task is being
> >called.
> Hi Garrick,
>
> I had not gotten time to test this earlier as I had been able to get
> some work done with OpenPBS + mpiexec + MPICH. But now I am back to this
> as I would really like the multi-server feature of Torque-2.
>
> I have Torque-2.0.0p8, MPICH-1.2.7p1, mpiexec-0.80.
>
> I tested with mom config as below:
>
> $logevent 255
> $loglevel 7
>
> Here is the relevant portion from the MS log.
>
> 04/10/2006 16:43:02;0008; pbs_mom;Job;scan_for_terminated;for job
> 51431.ribosome.cchmc.org, task 2, pid=20539, exitcode=0
The process is exiting without an error.
Are you running a command in the background or something?
What happens if try this in an interactive job.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060411/b880ff6a/attachment.bin
More information about the torqueusers
mailing list