[torqueusers] matlab jobs unkillable

Garrick Staples garrick at clusterresources.com
Fri Mar 23 14:31:47 MDT 2007


On Fri, Mar 23, 2007 at 03:45:54PM -0400, P Spencer Davis alleged:
> Hello,
>    We've had an interesting situation involving matlab jobs 
> resubmitting themselves to our queue. The cluster is running Torque 
> V-2.1.6, maui v-3.2.6p19 on a 64 node RHEL 4 cluster 
> (2.6.9-42.0.10.ELsmp kernel). The matlab version is 2006b. Some matlab 
> jobs are continuously failing and resubmitting themselves to the queue, 
> you can kill them using qdel, but  then it restarts, with the same job 
> number. I know that the jobs in question where being killed because mail 
> to that effect was being sent to the user's e-mail account. The only way 
> to kill the jobs was to shutdown the pbs server and maui and then delete 
> the job files in /var/spool/torque/server_priv/jobs. I'm including a 
> copy of the script that was written to submit the jobs... any idea what 
> I've missconfigured?
> 
> #!/bin/bash
> time=$2
> jobname=$(basename $1)
> path_to_job=$(dirname $1)
> echo "#PBS -N $jobname
> #PBS -j oe
> #PBS -m bae
> #PBS -l walltime=$time
> cd $path_to_job
> /home/local/bin/matlab < $jobname" > /tmp/$USER.$jobname
> qsub /tmp/$USER.$jobname
> rm /tmp/$USER.$jobname

You have a qsub in your job script.  So it is always submitting a new
job.



More information about the torqueusers mailing list