[torqueusers] killed by signal 15
Guillaume Alleon
guillaume.alleon at laposte.net
Sat Sep 9 00:48:35 MDT 2006
Ususally I use torque to schedule my jobs and then use mpiexec to launch
my parallel code ;-)
This my code is written in java ans use a java MPI implementation so
that I parse the $PBS_NODEFILE on
the mother node start a server node on it and "ssh" a java command
starting my process on all nodes in the
nodefile. I have a script for doing this (attached at the end)
This works fine when using qsub -I ... but all the pocesses are killed
by a signal 15 ? Any thought about what's going on ?
Here is my ugly script:
VRAI=1
NUM=0
NP=`wc -l $PBS_NODEFILE | awk '{print $1}'`
for i in `cat $PBS_NODEFILE`
do
if [[ $VRAI = "1" && $HOSTNAME = $i ]]
then
echo "only on: $i ($VRAI)"
SERVEUR=$i
echo "the server is on : $SERVEUR"
ibis-nameserver -poolserver -single&
VRAI=0
fi
echo "ssh $i ibis-run -nhosts $NP -hostno $NUM -ns $SERVEUR -ns-port
9826 RunHal &"
ssh $i ibis-run -nhosts $NP -hostno $NUM -ns $SERVEUR -ns-port 9826
RunHal &
NUM=`expr $NUM + 1`
done
--
Guillaume ALLEON
http://guillaume.alleon.free.fr/
More information about the torqueusers
mailing list