TORQUE Resource Manager
3. "Bad UID for job execution" error in job submissions
4. Assigning number of processors each node contains
Problem:
Node state down
When I run the command, pbsnodes -a, the state of all nodes is down.
Solutions:
1. Check to see if the moms are running on each compute node:
Type ps -ef|grep pbs_mom.
If no mom is running on the node, run pbs_mom.
Restart the pbs_server and check pbsnodes -a.
2. Check that the config file on each node is correctly pointing to the Host node.
For each compute host, the MOM server must be configured to trust the pbs_server daemon. In TORQUE 2.0.0p4 and earlier, this is done by creating the “$(TORQUECFG)/mom_priv/config” file and setting the $pbsserver parameter.
$pbsserver headnode # note: hostname running pbs_server |
In TORQUE 2.0.0p5 and later, this can also be done by creating the "$(TORQUECFG)/server_name" file and placing the server hostname inside.
| hostnode |
3. Check to see if your server can ping your mom by hostname and that your mom can ping your server by hostname.
4. If you have a firewall in place, make sure ports 15,000-15,004 are open.
Problem:
Missing Nodes
When I run pbsnodes -a all of my nodes are not displayed.
Solutions:
1. Add a node by typing qmgr -c "create node (name)".
2. Check the nodes file on the host node located at "$(TORQUECFG)/server_priv/nodes" to make sure all the nodes are listed.
If they are not, add the nodes, restart the pbs_server and run pbsnodes -a again to check if the missing nodes are now displayed.
node001 |
2 If the node state is down, see question #1.
3. Problem:
"Bad UID for job execution" error in job submissions
As root, I submit a job using the qsub command. I receive an error when doing so: "Bad UID for job execution."
Solution:
1. TORQUE does not allow job submissions from root. Switch to another user and submit the job again.
2. You must submit the job from a node that is allowed to submit jobs. By default, the head node can do this. To submit a job from another node, see the acl_hosts parameter.
4. Problem:
Assigning number of processors each node contains
My nodes have dual processors, but TORQUE only displays one processor.
Solution:
1. You must configure TORQUE to recognize the node as having more than one processor. You can accomplish this by one of two ways.
Run the command qmgr -c "set node [name] np=[number of procs].
or
In the "$TORQUECFG/server_priv/nodes" file, add np=[number of procs] on the line next to the node name.
Restart the pbs_server and run pbsnodes -a again.
node001 np=2 |

