[torqueusers] job only runs on 1 cpu
chemadm at hamilton.edu
Mon Jul 28 16:11:07 MDT 2008
You'll want to remove PBS from the picture and make sure it works
first just using MPI. Then once you know MPI is working I'm guessing
it could be problems with MPI jobs not getting the proper node
information. I had to install OSC's mpiexec in order to get mpi jobs
to stay under control of PBS and get allocated to node's correctly.
Hope this helps,
On Jul 28, 2008, at 5:58 PM, Jan Dettmer wrote:
> Thanks for the tip.
> I just recompiled with --with-tm.
> Still the same problem.
> #PBS -l nodes=1:ppn=8 will run fine (without -np option in mpiexec
> command) on 8 cpus on one node.
> #PBS -l nodes=2:ppn=8 will only start on one CPU on one node.
> Cheers, Jan
> James A. Peltier wrote:
> > Did you compile Open-MPI with --with-tm option enabled? If not,
> > Open-MPI doesn't have a clue about the options passed through PBS
> > you must specify the -np options manually. The qstat -f option will
> > only show that you have requested 15 cores, it doesn't really know
> > it's not using 16 cores.
>> On Sun, 27 Jul 2008, Jan Dettmer wrote:
>>> Hi all,
>>> I have a small cluster with 3 nodes, each node has 2 CPUs with 4
>>> cores each.
>>> I have been using the cluster for a few month now and it works
>>> mostly great
>>> with pbs and open-mpi.
>>> One problem I have been running into for a while is the following:
>>> Starting a job with a script containing
>>> #PBS -l nodes=1:ppn=8
>>> works perfectly. The job starts on 1 node on all 8 cores.
>>> #PBS -l nodes=2:ppn=8
>>> will start the job. qstat -f tells me that it is running on 16
>>> cores but checking with "top" shows that the job is only running
>>> one 1 core on 1 node (the node listed second in the nodes files).
>>> I could not find anything in the MOM logs concerning errors.
>>> Any help would be much appreciated.
>>> Cheers, Jan
> Jan Dettmer, Postdoctoral Fellow
> School of Earth and Ocean Sciences, University of Victoria
> Victoria, BC V8W 3P6
> office: (250) 472-4342 email: jand at uvic.ca
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers