[torqueusers] Torque 1.10p2 compatibility with myrinet ?
Song, Kai Song
KSong at lbl.gov
Wed Jul 22 13:52:48 MDT 2009
For our torque scheduler, if we only submit the job to 1 node, it works just fine. However, when we submit our job to 2 or more than 2 nodes, the nodes will not communicate to each other so that the job will just hang there until it's timeout.
We have tested our open-mpi program manually as follow:
/home/software/ompi/1.3.2-pgi/bin/mpirun -machinefile ./nodes -np 16 ./helloworld
It works fine so that we rule out the possibility of open-mpi's problem and myrinet connection problem. The only thing left is the torque scheduler, because it has a very old version(1.1.0p2)
So, we are wondering if the very old torque we have doesn't support myrinet so that we need to build a newer version of torque. Does anyone know more detail about torque 1.1.0p2 and help us with this issue?
Thanks in advance,
<ksong at lbl.gov> 1.510.486.4894
High Performance Computing Services (HPCS) Intern
Lawrence Berkeley National Laboratory - http://scs.lbl.gov
More information about the torqueusers