[torqueusers] job only runs on 1 cpu
jand at uvic.ca
Sun Jul 27 17:25:08 MDT 2008
I have a small cluster with 3 nodes, each node has 2 CPUs with 4 cores each.
I have been using the cluster for a few month now and it works mostly great
with pbs and open-mpi.
One problem I have been running into for a while is the following:
Starting a job with a script containing
#PBS -l nodes=1:ppn=8
works perfectly. The job starts on 1 node on all 8 cores.
#PBS -l nodes=2:ppn=8
will start the job. qstat -f tells me that it is running on 16 cores but
checking with "top" shows that the job is only running one 1 core on 1
node (the node listed second in the nodes files). I could not find
anything in the MOM logs concerning errors.
Any help would be much appreciated.
Jan Dettmer, Postdoctoral Fellow
School of Earth and Ocean Sciences, University of Victoria
Victoria, BC V8W 3P6
office: (250) 472-4342 email: jand at uvic.ca
More information about the torqueusers