[torqueusers] Torque with OpenMPI
quickparser at gmail.com
Thu Feb 21 10:09:42 MST 2008
This was great Craig! I would never tell that the code might be buggy.
I copied the recompiled binary to all my nodes (I don't still have NFS).
Now, when I run the code like this:
q-parser at f135-3:~$ mpirun -np 7 --hostfile zoznam test_app
0(f135-4): We have 7 processors
0(f135-4): Hello 1! Processor 1 (f135-5) reporting for duty
0(f135-4): Hello 2! Processor 2 (f135-6) reporting for duty
0(f135-4): Hello 3! Processor 3 (f135-7) reporting for duty
0(f135-4): Hello 4! Processor 4 (f135-8) reporting for duty
0(f135-4): Hello 5! Processor 5 (f135-9) reporting for duty
0(f135-4): Hello 6! Processor 6 (f135-11) reporting for duty
It seems to me that one processor is still lost, but I have no bug info with
However, when I run it using torque, the job seems to be hung. 'showq' shows
that the job is running but never finishes.
q-parser at f135-3:~$ showq
JOBNAME USERNAME STATE PROC REMAINING
113 q-parser Running 7 00:49:29 Thu Feb 21
1 Active Job 7 of 22 Processors Active (31.82%)
4 of 11 Nodes Active (36.36%)
My script looks like this:
#PBS -N test_job
#PBS -q batch
#PBS -l nodes=7
#PBS -l cput=00:02:00
All my nodes are running now. qstat -f tells me that the job was assigned to
I'm thankful for your time and effort.
On Thu, Feb 21, 2008 at 5:37 PM, Craig West <cwest at astro.umass.edu> wrote:
> It is buggy code. The simple problem is that idstr is only 32 chars.
> When you sprintf the long string at line 45 of the code you are writing
> past the end of the idstr buffer, segfaults and like will occur. Change
> the size of idstr to be 64 and try again. Don't go too much bigger than
> 64 as you will cause problems with BUFSIZE.
> I should note that it crashed here when I ran it, works fine with the
> > If anybody might know of anything that could help me I'm listening.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers