[torqueusers] Runnig a script after job completion
Chaitanya Krishna
icymist at gmail.com
Fri Oct 12 13:21:04 MDT 2007
Hello Glen,
May be I made the question too simple.
I use (actually given by the administrator) the following two scripts
to submit job.
---------------------------
submit_job. This one sets up all the environment variables required, I guess.
#!/bin/bash
export NUMBER_OF_NODES=1
export NUMBER_OF_CPUS_PER_NODE=1
export TOTAL_CPUS=$(($NUMBER_OF_NODES * $NUMBER_OF_CPUS_PER_NODE))
export SUBMIT_DIRECTORY=$PWD
echo "submit directory: $SUBMIT_DIRECTORY"
echo "total number of cpus: $TOTAL_CPUS"
echo "number of nodes: $NUMBER_OF_NODES"
echo "number of cpus per node: $NUMBER_OF_CPUS_PER_NODE"
qsub -m abe -M icymist at gmail.com -N present -W depend=afterany -l
nodes=$NUMBER_OF_NODES:ppn=$NUMBER_OF_CPUS_PER_NODE job
------------------------------------
and job has the following lines....
#! /bin/bash
#PBS -V
# -V means "inherit environment variables from parent shell"
cd $SUBMIT_DIRECTORY
echo "submit directory is: $SUBMIT_DIRECTORY"
echo "pbs_nodefile is: $PBS_NODEFILE"
echo "The nodefile lists the following nodes:"
cat $PBS_NODEFILE
export LAMRSH="ssh -x"
lamboot $PBS_NODEFILE
echo "Total number of cpus: $TOTAL_CPUS"
time /usr/local/lam/bin/mpirun -np $TOTAL_CPUS
/usr/local/vasp/bin/vaspmpi > vasp.out
echo "Done" > done.txt
./check_convergence.sh
lamhalt
--------------------------
And I submit the job by doing $ ./submit_job
The point is that the check_convergence.sh script checks for
convergence and then again does a $ ./submit_job.
Now what's happening is that I get the following error message in job.e*
real 0m34.648s
user 0m0.003s
sys 0m0.019s
qsub: Job rejected by all possible destinations
I suspect that the environment variables are not being set up properly
(???). One more thing is that it echoes done to done.txt, checks for
convergence too, but doesn't submit the job again.
Any clues??
Thanks for the help.
Chaitanya.
More information about the torqueusers
mailing list