[torqueusers] Re: File system "snapshot" fails when using Torque

Garrick Staples garrick at usc.edu
Wed Aug 3 18:13:08 MDT 2005


On Wed, Aug 03, 2005 at 04:30:33PM -0700, Rick.Ingham at expeditors.com alleged:
> I think my problem stems from --enable-plock-deamons.  Here's a description
> from Sun why the file system snapshot may be failing.
> 
>       xntpd runs in the realtime class, and locks
>       down all of its pages with mlockall.  This includes the xntpd binary
>       and the various libraries it is linked with.  This prevents lockfs
>       from being able to acquire a write lock on the file system --- the
>       check in ufs_thaw_wlock fails:

I'm confused.  My Solaris and Linux mlock/mlockall manpages state that they
"disable paging" (implies to me both swapping out dirty pages and releasing
clean code/data pages). 

I don't see how it would prevent a write lock on the fs.  I don't see how it
has anything to do with writing to the filesystem.

Someone please correct me.  I'm curious about this.

Btw, torque doesn't use mlock/mlockall, it uses plock().

 
> I'm going to try using --disable-plock-deamons.
> 
> By the way, what is the default value ?  Disabled ?  Enabled=0 ?

The configure script definitely defaults to disabled:

# Check whether --enable-plock_daemons or --disable-plock_daemons was given.
if test "${enable_plock_daemons+set}" = set; then
  enableval="$enable_plock_daemons"
  case "${enableval}" in
  yes) PLOCK_DAEMONS=7 ;;
  no)  PLOCK_DAEMONS=0 ;;
  *) PLOCK_DAEMONS="${enableval}" ;;
esac
else
  PLOCK_DAEMONS=0
fi


 
> --- Rick Ingham, Expeditors Int'l / IS
> ---- RICK.INGHAM at EXPEDITORS.COM  (206) 674-3400 x3284   FAX  246-3197
> 
> 
>                                                                            
>              Rick                                                          
>              Ingham/IS/Expedit                                             
>              ors                                                        To 
>                                        torqueusers at supercluster.org        
>              07/29/2005 09:55                                           cc 
>              AM                                                            
>                                                                    Subject 
>                                        File system "snapshot" fails when   
>                                        using Torque                        
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
> 
> 
> 
> We have been using OpenPBS on 100+ servers (standalone systems with server,
> scheduler, and mom) for many months on Sun Solaris 9 Sparc systems.  Last
> week we deployed Torque 1.2.0p5 to five servers.   Since then, the file
> system snapshot (snapfs) of the /usr mount point has been failing with
> every daily backup.
> 
> Our PBS programs are installed in:
>       /usr/local/sbin
>       /usr/local/bin
> 
> PBSHOME is:
>       /var/spool/PBS
> 
> I have not been able to explain why the snapshot is getting wrapped around
> the axle.  'lsof' shows nothing on /usr that the pbs_* daemons have open or
> locked.
> 
> Any ideas?  I'm stumped.
> 
> --- Rick Ingham, Expeditors Int'l / IS
> ---- RICK.INGHAM at EXPEDITORS.COM  (206) 674-3400 x3284   FAX  246-3197
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050803/cc0831d7/attachment.bin


More information about the torqueusers mailing list