[torqueusers] pbsdsh not exiting

Martin Siegert siegert at sfu.ca
Fri Feb 19 10:30:11 MST 2010


Hi,

we have a user who basically has the following submission script:

cd $PBS_O_WORKDIR
pbsdsh script.sh
<do final cleanup>

and script.sh contains

cd run$PBS_VNODENUM
myprog
echo "myprog finished for process $PBS_VNODENUM." 1>&2
exit

We find the the file <jobid>.ER contains all lines
myprog finished for process <proc>.
for all processes started by pbsdsh. Thus, all processes finished
(and the corresponding processors are idle without any processes
remaining), but the pbsdsh process does not exit on the mom
superior and hence the <do final cleanup> never gets done and the
job just hangs and blocks resources. The processes that remain on
the mom superior are:

S xyz       2718  5534   0   732   8792 02:55  0.0 00:00:00 pbsdsh script.sh
S xyz       5479  4495   0  1332  65920 Feb18  0.0 00:00:00 -bash
S xyz       5480  5479   0   732   8368 Feb18  0.0 00:00:00 pbs_demux
S xyz       5534  5479   0  1492  66132 Feb18  0.0 00:00:05 /bin/bash /var/spool/torque/mom_priv/jobs/732583.b0.SC

Does anybody know what is causing this? And how can this be solved?
(this is with torque-2.3.7)

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6


More information about the torqueusers mailing list