[torqueusers] Multi-node jobs get one node
jungelsman at hotmail.com
Thu Oct 2 09:03:40 MDT 2008
I'm a new torque user. In fact, this is for my directed project for my masters degree, so I hope someone can help. My small cluster consists of 4 physical nodes and 4 virtual machine nodes. I can ssh from any machine to any machine w/o being prompted for a password. My shared home folder is on a Lustre file system. None of the Torque binaries or libraries are located there. I am also using Maui. When I submit jobs, even when I specify specific machines, only the first node in the list from exehosts processes. I thought it may be node specific, so I killed pbs_mom on every node except two, and no matter which node is first in the list, it is the only one that processes. My executables are NPB 2.4. I've tried both the CG benchmark as well as the EP benchmark. What on earth is going on?
Any help would be appreciated,
Want to do more with Windows Live? Learn “10 hidden secrets” from Jamie.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers