[torqueusers] PBS mom not starting on node on reboot

Rajiv Chittajallu rajive at ieee.org
Tue Jun 19 11:42:17 MDT 2007


Anand wrote on 06/19/07 at 10:17:31 -0500:
>Hi all,
>
>We recently installed Torque on our local cluster, and everything was
>running fine.
>
>However, we had to reboot some of the nodes because of other issues, and
>when a node is rebooted, the pbs_mom doesn't start on the node. The
>pbs_server on the master node seems to be running fine, except that it
>classifies the node as "DOWN", even after several minutes of rebooting the
>node (taking into account the 10-minute cycle for 'ping'ing by the server to
>the node). When we attempt a restart of the pbs_mom on the node locally, we
>get the following problem:
>_______________________________________________________________
>Starting PBS
>pbs_mom: Permission denied (13) in chk_file_sec, Security violation with
>"/var/spool/torque/spool/"
>PBS mom
>_______________________________________________________________

check the permissions for /var/spool/torque/spool . It should not be world
writable or a sticky bit set. All directories in the path must be owned by
root. 

>
>I quickly want to mention that this is being done as 'root'. Any
>clues/suggestions?
>
>Thank you very much in advance.
>
Anand


More information about the torqueusers mailing list