[torqueusers] job only runs on 1 cpu
James A. Peltier
jpeltier at cs.sfu.ca
Mon Jul 28 13:15:44 MDT 2008
On Sun, 27 Jul 2008, Jan Dettmer wrote:
> Hi all,
>
> I have a small cluster with 3 nodes, each node has 2 CPUs with 4 cores each.
> I have been using the cluster for a few month now and it works mostly great
> with pbs and open-mpi.
>
> One problem I have been running into for a while is the following:
>
> Starting a job with a script containing
> #PBS -l nodes=1:ppn=8
> works perfectly. The job starts on 1 node on all 8 cores.
>
> However
> #PBS -l nodes=2:ppn=8
> will start the job. qstat -f tells me that it is running on 16 cores but
> checking with "top" shows that the job is only running one 1 core on 1 node
> (the node listed second in the nodes files). I could not find anything in
> the MOM logs concerning errors.
>
> Any help would be much appreciated.
>
> Cheers, Jan
>
Did you compile Open-MPI with --with-tm option enabled? If not, Open-MPI
doesn't have a clue about the options passed through PBS and you must
specify the -np options manually. The qstat -f option will only show that
you have requested 15 cores, it doesn't really know that it's not using 16
cores.
--
James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
Simon Fraser University - Burnaby Campus
Phone : 778-782-6573
Fax : 778-782-3045
Mobile : 778-840-6434
E-Mail : jpeltier at sfu.ca
Website : http://www.fas.sfu.ca | http://vivarium.cs.sfu.ca
MSN : subatomic_spam at hotmail.com
More information about the torqueusers
mailing list