[torqueusers] multiple job array dependency bug with simple test case script

Tom Asbury tom.asbury at sequentainc.com
Mon Aug 13 11:55:42 MDT 2012



I've boiled down what I think to be a significant job dependency array bug to a simple test case.
The problem is that job array dependencies are not honored when you have
more than one array dependency.

The script below creates 2 small sleep jobs of different lengths and submits
each as an array. A final wrap-up job is submitted as depending upon both
of the job arrays successfully finishing. However, the dependent job does not
wait for both arrays to exit successfully before starting, rather,
it starts early after the first array job finishes:

   sleep120_array           sleep60_array
        |                       |
        |                       |
        |                       -
        |                     done
        |                dep job starts
        |                    *bug*
        |
       ---
       done
dep job should start here


#--------SH SCRIPT START------
cat << EOF > sleeper60
sleep 60
EOF
cat << EOF > sleeper120
sleep 120
EOF

JOB1=`cat sleeper120 | qsub -t 1-2`
JOB2=`cat sleeper60 | qsub -t 1-2`
echo "cat sleeper60 | qsub -W depend=afterokarray:${JOB1}:${JOB2}"
cat sleeper60 | qsub -W depend=afterokarray:${JOB1}:${JOB2}
# --------SH SCRIPT END------

Running the script above will give the something like the following:

> sh SCRIPT
cat sleeper60 | qsub -W depend=afterokarray:55349[].madrid:55350[].madrid
55351.madrid

-- after 90 seconds --

> qstat
55349[].madrid             STDIN            tasbu                  0 R batch
55350[].madrid             STDIN            tasbu                  0 C batch
55351.madrid               STDIN            tasbu                  0 R batch

55351.madrid should not be running!

Torque version: 3.0.5

A solution to this problem is crucial to our pipeline and I would
appreciate any fixes / workarounds.

Thanks -
Tom
 
	
	
 


More information about the torqueusers mailing list