[torqueusers] qdel and remaining parts of jobs
bill
cluster.bill at alinto.com
Tue Sep 26 04:17:28 MDT 2006
Hello
I noticed that qdel doesn't wipe out everything.
for example, an easy script:
#PBS -S /bin/bash
#PBS -l "nodes=2:ppn=2"
NCPU=`wc -l < $PBS_NODEFILE`
lamboot $PBS_NODEFILE
cd /home/simu1/hello
mpirun.lam -np ${NCPU} hello-30s
lamhalt $PBS_NODEFILE
I can launch this job. If I qdel it, the lam daemon aren't stopped.
After some times, I got a lot of running lam daemon that doesn't belong
to any job.
I read the qdel manpage. qdel send a SIGTERM, then a SIGKILL.
I can trap the TERM signal, in order to clean-up jobs. (I tried, and it
works). But I don't think that every user will think of it.
lamboot $PBS_NODEFILE
cd /home/simu1/hello
mpirun.lam -np ${NCPU} hello-30s &
pid=$!
trap '
kill -15 $pid
lamhalt $PBS_NODEFILE
' TERM
wait
lamhalt $PBS_NODEFILE
Did you experience this problem? What is your policy? Is there a way to
force qdel to terminate processes?
Thanks
More information about the torqueusers
mailing list