[torqueusers] Torque with OpenMPI
quickparser at gmail.com
Thu Feb 21 12:27:01 MST 2008
You may be right. I'm no programmer ;)
However I have one more question for you. I have 11 working nodes, which
2 processors (actually 2 logical cores on P4 with HT). So I have 22
even torque recognize them as well.
When I want to submit job to more than 11 nodes, it won't allow me to do so.
I can't tell you the exact message as I don't have access to my cluster (not
even remotely) at the moment.
Is there a way to set it up? I'm sorry I can't tell you any further details
Anyway, thank you very much with that code. It works 100% now.
On Thu, Feb 21, 2008 at 7:55 PM, Craig West <cwest at astro.umass.edu> wrote:
> There isn't actually a processor lost. I just guessed at how the code
> worked before I had seen the code itself. After looking at the code you
> can see that the first processor sends and receives messages to all the
> other processors. It doesn't send one to itself.
> > It seems to me that one processor is still lost, but I have no bug
> > info with this.
> > However, when I run it using torque, the job seems to be hung. 'showq'
> > shows
> > that the job is running but never finishes.
> > All my nodes are running now. qstat -f tells me that the job was
> > assigned to these hosts:
> > exec_host =
> > f135-15/1+f135-15/0+f135-14/1+f135-14/0+f135-13/1+f135-13/0+f1
> > 35-12/0
> > I'm thankful for your time and effort.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers