[torqueusers] job only runs on 1 cpu
Jan Dettmer
jand at uvic.ca
Mon Jul 28 15:58:54 MDT 2008
Thanks for the tip.
I just recompiled with --with-tm.
Still the same problem.
#PBS -l nodes=1:ppn=8 will run fine (without -np option in mpiexec
command) on 8 cpus on one node.
#PBS -l nodes=2:ppn=8 will only start on one CPU on one node.
Cheers, Jan
James A. Peltier wrote:
> Did you compile Open-MPI with --with-tm option enabled? If not,
> Open-MPI doesn't have a clue about the options passed through PBS and
> you must specify the -np options manually. The qstat -f option will
> only show that you have requested 15 cores, it doesn't really know that
> it's not using 16 cores.
>
> On Sun, 27 Jul 2008, Jan Dettmer wrote:
>
>> Hi all,
>>
>> I have a small cluster with 3 nodes, each node has 2 CPUs with 4 cores
>> each.
>> I have been using the cluster for a few month now and it works mostly
>> great
>> with pbs and open-mpi.
>>
>> One problem I have been running into for a while is the following:
>>
>> Starting a job with a script containing
>> #PBS -l nodes=1:ppn=8
>> works perfectly. The job starts on 1 node on all 8 cores.
>>
>> However
>> #PBS -l nodes=2:ppn=8
>> will start the job. qstat -f tells me that it is running on 16 cores
>> but checking with "top" shows that the job is only running one 1 core
>> on 1 node (the node listed second in the nodes files). I could not
>> find anything in the MOM logs concerning errors.
>>
>> Any help would be much appreciated.
>>
>> Cheers, Jan
>>
>
--
Jan Dettmer, Postdoctoral Fellow
School of Earth and Ocean Sciences, University of Victoria
Victoria, BC V8W 3P6
office: (250) 472-4342 email: jand at uvic.ca
http://web.uvic.ca/~jand/
More information about the torqueusers
mailing list