[torqueusers] job only runs on 1 cpu

James A. Peltier jpeltier at cs.sfu.ca
Mon Jul 28 13:15:44 MDT 2008


On Sun, 27 Jul 2008, Jan Dettmer wrote:

> Hi all,
>
> I have a small cluster with 3 nodes, each node has 2 CPUs with 4 cores each.
> I have been using the cluster for a few month now and it works mostly great
> with pbs and open-mpi.
>
> One problem I have been running into for a while is the following:
>
> Starting a job with a script containing
> #PBS -l nodes=1:ppn=8
> works perfectly. The job starts on 1 node on all 8 cores.
>
> However
> #PBS -l nodes=2:ppn=8
> will start the job. qstat -f tells me that it is running on 16 cores but 
> checking with  "top" shows that the job is only running one 1 core on 1 node 
> (the node listed second in the nodes files).  I could not find anything in 
> the MOM logs concerning errors.
>
> Any help would be much appreciated.
>
> Cheers, Jan
>

Did you compile Open-MPI with --with-tm option enabled?  If not, Open-MPI 
doesn't have a clue about the options passed through PBS and you must 
specify the -np options manually.  The qstat -f option will only show that 
you have requested 15 cores, it doesn't really know that it's not using 16 
cores.

-- 
James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
Mobile  : 778-840-6434
E-Mail  : jpeltier at sfu.ca
Website : http://www.fas.sfu.ca | http://vivarium.cs.sfu.ca
MSN     : subatomic_spam at hotmail.com


More information about the torqueusers mailing list