[torqueusers] job only runs on 1 cpu

Jan Dettmer jand at uvic.ca
Mon Jul 28 15:58:54 MDT 2008


Thanks for the tip.

I just recompiled with --with-tm.

Still the same problem.

#PBS -l nodes=1:ppn=8 will run fine (without -np option in mpiexec 
command) on 8 cpus on one node.

#PBS -l nodes=2:ppn=8 will only start on one CPU on one node.

Cheers, Jan


James A. Peltier wrote:
 > Did you compile Open-MPI with --with-tm option enabled?  If not,
 > Open-MPI doesn't have a clue about the options passed through PBS and
 > you must specify the -np options manually.  The qstat -f option will
 > only show that you have requested 15 cores, it doesn't really know that
 > it's not using 16 cores.
 >
> On Sun, 27 Jul 2008, Jan Dettmer wrote:
> 
>> Hi all,
>>
>> I have a small cluster with 3 nodes, each node has 2 CPUs with 4 cores 
>> each.
>> I have been using the cluster for a few month now and it works mostly 
>> great
>> with pbs and open-mpi.
>>
>> One problem I have been running into for a while is the following:
>>
>> Starting a job with a script containing
>> #PBS -l nodes=1:ppn=8
>> works perfectly. The job starts on 1 node on all 8 cores.
>>
>> However
>> #PBS -l nodes=2:ppn=8
>> will start the job. qstat -f tells me that it is running on 16 cores 
>> but checking with  "top" shows that the job is only running one 1 core 
>> on 1 node (the node listed second in the nodes files).  I could not 
>> find anything in the MOM logs concerning errors.
>>
>> Any help would be much appreciated.
>>
>> Cheers, Jan
>>
> 


-- 
Jan Dettmer, Postdoctoral Fellow
School of Earth and Ocean Sciences, University of Victoria	
Victoria, BC V8W 3P6
office: (250) 472-4342	email: jand at uvic.ca
http://web.uvic.ca/~jand/


More information about the torqueusers mailing list