[torqueusers] qdel and remaining parts of jobs

bill cluster.bill at alinto.com
Tue Sep 26 04:17:28 MDT 2006


Hello

I noticed that qdel doesn't wipe out everything.

for example, an easy script:

#PBS -S /bin/bash
#PBS -l "nodes=2:ppn=2"
NCPU=`wc -l < $PBS_NODEFILE`
lamboot $PBS_NODEFILE
cd /home/simu1/hello
mpirun.lam -np ${NCPU} hello-30s
lamhalt $PBS_NODEFILE


I can launch this job. If I qdel it, the lam daemon aren't stopped.

After some times, I got a lot of running lam daemon that doesn't belong 
to any job.

I read the qdel manpage. qdel send a SIGTERM, then a SIGKILL.

I can trap the TERM signal, in order to clean-up jobs. (I tried, and it 
works). But I don't think that every user will think of it.

lamboot $PBS_NODEFILE
cd /home/simu1/hello
mpirun.lam -np ${NCPU} hello-30s &
pid=$!
trap '
  kill -15 $pid
  lamhalt $PBS_NODEFILE
' TERM
wait
lamhalt $PBS_NODEFILE

Did you experience this problem? What is your policy? Is there a way to 
force qdel to terminate processes?

Thanks


More information about the torqueusers mailing list