[torqueusers] PBS mom not starting on node on reboot

Anand Nilekar aunilekar at wisc.edu
Tue Jun 19 09:17:31 MDT 2007


Hi all,

We recently installed Torque on our local cluster, and everything was
running fine.

However, we had to reboot some of the nodes because of other issues, and
when a node is rebooted, the pbs_mom doesn't start on the node. The
pbs_server on the master node seems to be running fine, except that it
classifies the node as "DOWN", even after several minutes of rebooting the
node (taking into account the 10-minute cycle for 'ping'ing by the server to
the node). When we attempt a restart of the pbs_mom on the node locally, we
get the following problem:
_______________________________________________________________
Starting PBS
pbs_mom: Permission denied (13) in chk_file_sec, Security violation with
"/var/spool/torque/spool/"
PBS mom
_______________________________________________________________

I quickly want to mention that this is being done as 'root'. Any
clues/suggestions?

Thank you very much in advance.

Anand






More information about the torqueusers mailing list