[torqueusers] qdel and remaining parts of jobs

David Golden dgolden at cp.dias.ie
Tue Sep 26 04:49:06 MDT 2006


On Tuesday 26 September 2006 11:17, bill wrote:
> Hello
>
> I noticed that qdel doesn't wipe out everything.
>
> for example, an easy script:
>
> #PBS -S /bin/bash
> #PBS -l "nodes=2:ppn=2"
> NCPU=`wc -l < $PBS_NODEFILE`
> lamboot $PBS_NODEFILE
> cd /home/simu1/hello
> mpirun.lam -np ${NCPU} hello-30s
> lamhalt $PBS_NODEFILE
>
>
> I can launch this job. If I qdel it, the lam daemon aren't stopped.
>
> After some times, I got a lot of running lam daemon that doesn't belong
> to any job.
>

Looks like you may not_ be using LAM's TM (torque/pbs integration) support?
You're effectively working outside the batch system, therefore, and torque
can't track your tasks (you're also using the "old" convention, recent LAM 
mpiexec includes an automagic lamboot and lamhalt AFAIK)

LAM used to be able to use the TM API, however LAM 7.1.2 (last stable version) 
didn't include support for the new torque 2 library organisation 
(pbs-config /libtorque ).  Dunno about later versions, but LAM is thoroughly 
in "maintenance mode", all developers moved to OpenMPI project primarily, so 
I'm not sure anyone will have actually patched it. Ask the LAM folk, maybe.





More information about the torqueusers mailing list