[torqueusers] killed by signal 15

Guillaume Alleon guillaume.alleon at laposte.net
Sat Sep 9 00:48:35 MDT 2006


Ususally I use torque to schedule my jobs and then use mpiexec to launch 
my parallel code ;-)
This my code is written in java ans use a java MPI implementation so 
that I parse the $PBS_NODEFILE on
the mother node start a server node on it and "ssh" a java command 
starting my process on all nodes in the
nodefile. I have a script for doing this (attached at the end)

This works fine when using qsub -I ... but all the pocesses are killed 
by a signal 15 ? Any thought about what's going on ?

Here is my ugly script:
VRAI=1
NUM=0
NP=`wc -l $PBS_NODEFILE | awk '{print $1}'`
for i in `cat $PBS_NODEFILE`
do
  if [[ $VRAI = "1" && $HOSTNAME = $i ]]
  then
    echo "only on: $i ($VRAI)"
    SERVEUR=$i
    echo "the server is on : $SERVEUR"
    ibis-nameserver -poolserver -single&
    VRAI=0
  fi
  echo "ssh $i ibis-run -nhosts $NP -hostno $NUM -ns $SERVEUR -ns-port 
9826 RunHal &"
  ssh $i ibis-run -nhosts $NP -hostno $NUM -ns $SERVEUR -ns-port 9826 
RunHal &
  NUM=`expr $NUM + 1`
done


-- 
Guillaume ALLEON
http://guillaume.alleon.free.fr/




More information about the torqueusers mailing list