[torqueusers] job only runs on 1 cpu

Steve Young chemadm at hamilton.edu
Mon Jul 28 16:11:07 MDT 2008


Hi Jan,
	You'll want to remove PBS from the picture and make sure it works  
first just using MPI. Then once you know MPI is working I'm guessing  
it could be problems with MPI jobs not getting the proper node  
information. I had to install OSC's mpiexec in order to get mpi jobs  
to stay under control of PBS and get allocated to node's correctly.

http://www.osc.edu/~pw/mpiexec/index.php

Hope this helps,

-Steve


On Jul 28, 2008, at 5:58 PM, Jan Dettmer wrote:

> Thanks for the tip.
>
> I just recompiled with --with-tm.
>
> Still the same problem.
>
> #PBS -l nodes=1:ppn=8 will run fine (without -np option in mpiexec  
> command) on 8 cpus on one node.
>
> #PBS -l nodes=2:ppn=8 will only start on one CPU on one node.
>
> Cheers, Jan
>
>
> James A. Peltier wrote:
> > Did you compile Open-MPI with --with-tm option enabled?  If not,
> > Open-MPI doesn't have a clue about the options passed through PBS  
> and
> > you must specify the -np options manually.  The qstat -f option will
> > only show that you have requested 15 cores, it doesn't really know  
> that
> > it's not using 16 cores.
> >
>> On Sun, 27 Jul 2008, Jan Dettmer wrote:
>>> Hi all,
>>>
>>> I have a small cluster with 3 nodes, each node has 2 CPUs with 4  
>>> cores each.
>>> I have been using the cluster for a few month now and it works  
>>> mostly great
>>> with pbs and open-mpi.
>>>
>>> One problem I have been running into for a while is the following:
>>>
>>> Starting a job with a script containing
>>> #PBS -l nodes=1:ppn=8
>>> works perfectly. The job starts on 1 node on all 8 cores.
>>>
>>> However
>>> #PBS -l nodes=2:ppn=8
>>> will start the job. qstat -f tells me that it is running on 16  
>>> cores but checking with  "top" shows that the job is only running  
>>> one 1 core on 1 node (the node listed second in the nodes files).   
>>> I could not find anything in the MOM logs concerning errors.
>>>
>>> Any help would be much appreciated.
>>>
>>> Cheers, Jan
>>>
>
>
> -- 
> Jan Dettmer, Postdoctoral Fellow
> School of Earth and Ocean Sciences, University of Victoria	
> Victoria, BC V8W 3P6
> office: (250) 472-4342	email: jand at uvic.ca
> http://web.uvic.ca/~jand/
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list