[Mauiusers] Problem with Torque/Maui
garrick at clusterresources.com
Wed Jan 24 20:45:59 MST 2007
On Wed, Jan 24, 2007 at 08:51:11AM +0530, S Ranjan alleged:
> I have torque pbs_server running on the headnode, which is also the
> submit host. There are 32 other compute nodes, mentioned in
> /var/spool/torque/server_priv/nodes file. There is a single queue at
> present. Sometimes, mpi jobs requesting for 28/30 nodes, land up
> running on the head node, though the head node is not a compute node at
> all. netstat -anp shows several sockets being openend for the job, and
> eventually the head node hangs up.
> Appreciate any help/suggestion on this.
Which MPI? MPICH? I'd guess mpirun is using the default machinefile
that is created when mpich is built, and not the hostlist provided by
the PBS job.
Run mpirun with "-machinefile $PBS_NODEFILE" or use OSC's mpiexec
instead of mpirun: http://www.osc.edu/~pw/mpiexec/
More information about the mauiusers