[torqueusers] pbsdsh not exiting
Martin Siegert
siegert at sfu.ca
Fri Feb 19 10:56:13 MST 2010
Some additional information:
when I strace the running pbsdsh process and then attempt to
kill the process I see the following from strace:
# strace -f -p 2718
Process 2718 attached - interrupt to quit
poll([{fd=3, events=POLLIN|POLLHUP}], 1, 2147483647) = -1 EINTR (Interrupted system call)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
poll([{fd=3, events=POLLIN|POLLHUP}], 1, 2147483647) = -1 EINTR (Interrupted system call)
... and pbsdsh still does not exit.
Cheers,
Martin
On Fri, Feb 19, 2010 at 09:30:11AM -0800, Martin Siegert wrote:
> Hi,
>
> we have a user who basically has the following submission script:
>
> cd $PBS_O_WORKDIR
> pbsdsh script.sh
> <do final cleanup>
>
> and script.sh contains
>
> cd run$PBS_VNODENUM
> myprog
> echo "myprog finished for process $PBS_VNODENUM." 1>&2
> exit
>
> We find the the file <jobid>.ER contains all lines
> myprog finished for process <proc>.
> for all processes started by pbsdsh. Thus, all processes finished
> (and the corresponding processors are idle without any processes
> remaining), but the pbsdsh process does not exit on the mom
> superior and hence the <do final cleanup> never gets done and the
> job just hangs and blocks resources. The processes that remain on
> the mom superior are:
>
> S xyz 2718 5534 0 732 8792 02:55 0.0 00:00:00 pbsdsh script.sh
> S xyz 5479 4495 0 1332 65920 Feb18 0.0 00:00:00 -bash
> S xyz 5480 5479 0 732 8368 Feb18 0.0 00:00:00 pbs_demux
> S xyz 5534 5479 0 1492 66132 Feb18 0.0 00:00:05 /bin/bash /var/spool/torque/mom_priv/jobs/732583.b0.SC
>
> Does anybody know what is causing this? And how can this be solved?
> (this is with torque-2.3.7)
>
> Cheers,
> Martin
>
> --
> Martin Siegert
> Head, Research Computing
> WestGrid Site Lead
> IT Services phone: 778 782-4691
> Simon Fraser University fax: 778 782-4242
> Burnaby, British Columbia email: siegert at sfu.ca
> Canada V5A 1S6
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list