[torqueusers] tm problem OSX PPC single node
garrick at clusterresources.com
Fri Feb 23 13:07:17 MST 2007
On Fri, Feb 23, 2007 at 02:30:04PM -0500, Brock Palen alleged:
> We have found a problem when trying to run jobs using tm when running
> on only one node. Which is quite strange. If the MPI library (Lam
> or OpenMPI) uses 2 nodes (nodes=2:ppn=2) the job will start just
> fine. But if its 1 (nodes=1:ppn=2) the job can not start. This is
> not a problem for serial jobs, we are also using the same versions
> of torque and lam/openmpi on our linux cluster with no problems. If
> i build a LAM without tm support the jobs run fine.
> I dug the archives and i found some references to a similar problem.
> Im just wondering what i should do to test it or if this is a known
> problem on OSX ? The systems are running 10.3 on G5's, its using
I'm not aware of any current OSX issues.
It is easy to isolate this to TM with 'pbsdsh'. Just do some tests with
something like 'pbsdsh hostname' with the different sized jobs.
If this fails, then it is definitely a TM problem. Otherwise it should
be punted to the OpenMPI peeps.
More information about the torqueusers