[torqueusers] pbsdsh not exiting

Martin Siegert siegert at sfu.ca
Fri Feb 19 10:56:13 MST 2010


Some additional information:
when I strace the running pbsdsh process and then attempt to
kill the process I see the following from strace:

# strace -f -p 2718
Process 2718 attached - interrupt to quit
poll([{fd=3, events=POLLIN|POLLHUP}], 1, 2147483647) = -1 EINTR (Interrupted system call)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = -1 EINTR (Interrupted system call)
poll([{fd=3, events=POLLIN|POLLHUP}], 1, 2147483647) = -1 EINTR (Interrupted system call)

... and pbsdsh still does not exit.

Cheers,
Martin

On Fri, Feb 19, 2010 at 09:30:11AM -0800, Martin Siegert wrote:
> Hi,
> 
> we have a user who basically has the following submission script:
> 
> cd $PBS_O_WORKDIR
> pbsdsh script.sh
> <do final cleanup>
> 
> and script.sh contains
> 
> cd run$PBS_VNODENUM
> myprog
> echo "myprog finished for process $PBS_VNODENUM." 1>&2
> exit
> 
> We find the the file <jobid>.ER contains all lines
> myprog finished for process <proc>.
> for all processes started by pbsdsh. Thus, all processes finished
> (and the corresponding processors are idle without any processes
> remaining), but the pbsdsh process does not exit on the mom
> superior and hence the <do final cleanup> never gets done and the
> job just hangs and blocks resources. The processes that remain on
> the mom superior are:
> 
> S xyz       2718  5534   0   732   8792 02:55  0.0 00:00:00 pbsdsh script.sh
> S xyz       5479  4495   0  1332  65920 Feb18  0.0 00:00:00 -bash
> S xyz       5480  5479   0   732   8368 Feb18  0.0 00:00:00 pbs_demux
> S xyz       5534  5479   0  1492  66132 Feb18  0.0 00:00:05 /bin/bash /var/spool/torque/mom_priv/jobs/732583.b0.SC
> 
> Does anybody know what is causing this? And how can this be solved?
> (this is with torque-2.3.7)
> 
> Cheers,
> Martin
> 
> -- 
> Martin Siegert
> Head, Research Computing
> WestGrid Site Lead
> IT Services                                phone: 778 782-4691
> Simon Fraser University                    fax:   778 782-4242
> Burnaby, British Columbia                  email: siegert at sfu.ca
> Canada  V5A 1S6
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list