[torqueusers] Torque 1.10p2 compatibility with myrinet ?

Song, Kai Song KSong at lbl.gov
Wed Jul 22 13:52:48 MDT 2009


Hi All,

For our torque scheduler, if we only submit the job to 1 node, it works just fine. However, when we submit our job to 2 or more than 2 nodes, the nodes will not communicate to each other so that the job will just hang there until it's timeout.

We have tested our open-mpi program manually as follow:
/home/software/ompi/1.3.2-pgi/bin/mpirun -machinefile ./nodes -np 16 ./helloworld

It works fine so that we rule out the possibility of open-mpi's problem and myrinet connection problem. The only thing left is the torque scheduler, because it has a very old version(1.1.0p2)

So, we are wondering if the very old torque we have doesn't support myrinet so that we need to build a newer version of torque. Does anyone know more detail about torque 1.1.0p2 and help us with this issue?

Thanks in advance,

Kai



--------------------
Kai Song
<ksong at lbl.gov> 1.510.486.4894
High Performance Computing Services (HPCS) Intern
Lawrence Berkeley National Laboratory - http://scs.lbl.gov



More information about the torqueusers mailing list