[torqueusers] matlab jobs unkillable
Garrick Staples
garrick at clusterresources.com
Fri Mar 23 14:31:47 MDT 2007
On Fri, Mar 23, 2007 at 03:45:54PM -0400, P Spencer Davis alleged:
> Hello,
> We've had an interesting situation involving matlab jobs
> resubmitting themselves to our queue. The cluster is running Torque
> V-2.1.6, maui v-3.2.6p19 on a 64 node RHEL 4 cluster
> (2.6.9-42.0.10.ELsmp kernel). The matlab version is 2006b. Some matlab
> jobs are continuously failing and resubmitting themselves to the queue,
> you can kill them using qdel, but then it restarts, with the same job
> number. I know that the jobs in question where being killed because mail
> to that effect was being sent to the user's e-mail account. The only way
> to kill the jobs was to shutdown the pbs server and maui and then delete
> the job files in /var/spool/torque/server_priv/jobs. I'm including a
> copy of the script that was written to submit the jobs... any idea what
> I've missconfigured?
>
> #!/bin/bash
> time=$2
> jobname=$(basename $1)
> path_to_job=$(dirname $1)
> echo "#PBS -N $jobname
> #PBS -j oe
> #PBS -m bae
> #PBS -l walltime=$time
> cd $path_to_job
> /home/local/bin/matlab < $jobname" > /tmp/$USER.$jobname
> qsub /tmp/$USER.$jobname
> rm /tmp/$USER.$jobname
You have a qsub in your job script. So it is always submitting a new
job.
More information about the torqueusers
mailing list