[torqueusers] Re: File system "snapshot" fails when using Torque

Rick.Ingham at expeditors.com Rick.Ingham at expeditors.com
Wed Aug 3 17:30:33 MDT 2005


I think my problem stems from --enable-plock-deamons.  Here's a description
from Sun why the file system snapshot may be failing.

      xntpd runs in the realtime class, and locks
      down all of its pages with mlockall.  This includes the xntpd binary
      and the various libraries it is linked with.  This prevents lockfs
      from being able to acquire a write lock on the file system --- the
      check in ufs_thaw_wlock fails:

I'm going to try using --disable-plock-deamons.

By the way, what is the default value ?  Disabled ?  Enabled=0 ?

--- Rick Ingham, Expeditors Int'l / IS
---- RICK.INGHAM at EXPEDITORS.COM  (206) 674-3400 x3284   FAX  246-3197


                                                                           
             Rick                                                          
             Ingham/IS/Expedit                                             
             ors                                                        To 
                                       torqueusers at supercluster.org        
             07/29/2005 09:55                                           cc 
             AM                                                            
                                                                   Subject 
                                       File system "snapshot" fails when   
                                       using Torque                        
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           



We have been using OpenPBS on 100+ servers (standalone systems with server,
scheduler, and mom) for many months on Sun Solaris 9 Sparc systems.  Last
week we deployed Torque 1.2.0p5 to five servers.   Since then, the file
system snapshot (snapfs) of the /usr mount point has been failing with
every daily backup.

Our PBS programs are installed in:
      /usr/local/sbin
      /usr/local/bin

PBSHOME is:
      /var/spool/PBS

I have not been able to explain why the snapshot is getting wrapped around
the axle.  'lsof' shows nothing on /usr that the pbs_* daemons have open or
locked.

Any ideas?  I'm stumped.

--- Rick Ingham, Expeditors Int'l / IS
---- RICK.INGHAM at EXPEDITORS.COM  (206) 674-3400 x3284   FAX  246-3197



More information about the torqueusers mailing list