[torqueusers] pbsdsh not exiting

Martin Siegert siegert at sfu.ca
Fri Feb 19 10:30:11 MST 2010


we have a user who basically has the following submission script:

pbsdsh script.sh
<do final cleanup>

and script.sh contains

echo "myprog finished for process $PBS_VNODENUM." 1>&2

We find the the file <jobid>.ER contains all lines
myprog finished for process <proc>.
for all processes started by pbsdsh. Thus, all processes finished
(and the corresponding processors are idle without any processes
remaining), but the pbsdsh process does not exit on the mom
superior and hence the <do final cleanup> never gets done and the
job just hangs and blocks resources. The processes that remain on
the mom superior are:

S xyz       2718  5534   0   732   8792 02:55  0.0 00:00:00 pbsdsh script.sh
S xyz       5479  4495   0  1332  65920 Feb18  0.0 00:00:00 -bash
S xyz       5480  5479   0   732   8368 Feb18  0.0 00:00:00 pbs_demux
S xyz       5534  5479   0  1492  66132 Feb18  0.0 00:00:05 /bin/bash /var/spool/torque/mom_priv/jobs/732583.b0.SC

Does anybody know what is causing this? And how can this be solved?
(this is with torque-2.3.7)


