[torqueusers] matlab jobs unkillable

P Spencer Davis psdavis at bsu.edu
Fri Mar 23 13:45:54 MDT 2007


Hello,
    We've had an interesting situation involving matlab jobs 
resubmitting themselves to our queue. The cluster is running Torque 
V-2.1.6, maui v-3.2.6p19 on a 64 node RHEL 4 cluster 
(2.6.9-42.0.10.ELsmp kernel). The matlab version is 2006b. Some matlab 
jobs are continuously failing and resubmitting themselves to the queue, 
you can kill them using qdel, but  then it restarts, with the same job 
number. I know that the jobs in question where being killed because mail 
to that effect was being sent to the user's e-mail account. The only way 
to kill the jobs was to shutdown the pbs server and maui and then delete 
the job files in /var/spool/torque/server_priv/jobs. I'm including a 
copy of the script that was written to submit the jobs... any idea what 
I've missconfigured?

#!/bin/bash
time=$2
jobname=$(basename $1)
path_to_job=$(dirname $1)
echo "#PBS -N $jobname
#PBS -j oe
#PBS -m bae
#PBS -l walltime=$time
cd $path_to_job
/home/local/bin/matlab < $jobname" > /tmp/$USER.$jobname
qsub /tmp/$USER.$jobname
rm /tmp/$USER.$jobname


More information about the torqueusers mailing list