[torqueusers] job only runs on 1 cpu

James A. Peltier jpeltier at cs.sfu.ca
Mon Jul 28 13:15:44 MDT 2008

On Sun, 27 Jul 2008, Jan Dettmer wrote:

> Hi all,
> I have a small cluster with 3 nodes, each node has 2 CPUs with 4 cores each.
> I have been using the cluster for a few month now and it works mostly great
> with pbs and open-mpi.
> One problem I have been running into for a while is the following:
> Starting a job with a script containing
> #PBS -l nodes=1:ppn=8
> works perfectly. The job starts on 1 node on all 8 cores.
> However
> #PBS -l nodes=2:ppn=8
> will start the job. qstat -f tells me that it is running on 16 cores but 
> checking with  "top" shows that the job is only running one 1 core on 1 node 
> (the node listed second in the nodes files).  I could not find anything in 
> the MOM logs concerning errors.
> Any help would be much appreciated.
> Cheers, Jan

Did you compile Open-MPI with --with-tm option enabled?  If not, Open-MPI 
doesn't have a clue about the options passed through PBS and you must 
specify the -np options manually.  The qstat -f option will only show that 
you have requested 15 cores, it doesn't really know that it's not using 16 

James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
Mobile  : 778-840-6434
E-Mail  : jpeltier at sfu.ca
Website : http://www.fas.sfu.ca | http://vivarium.cs.sfu.ca
MSN     : subatomic_spam at hotmail.com

More information about the torqueusers mailing list