[torqueusers] Re: File system "snapshot" fails when using Torque
Garrick Staples
garrick at usc.edu
Wed Aug 3 18:13:08 MDT 2005
On Wed, Aug 03, 2005 at 04:30:33PM -0700, Rick.Ingham at expeditors.com alleged:
> I think my problem stems from --enable-plock-deamons. Here's a description
> from Sun why the file system snapshot may be failing.
>
> xntpd runs in the realtime class, and locks
> down all of its pages with mlockall. This includes the xntpd binary
> and the various libraries it is linked with. This prevents lockfs
> from being able to acquire a write lock on the file system --- the
> check in ufs_thaw_wlock fails:
I'm confused. My Solaris and Linux mlock/mlockall manpages state that they
"disable paging" (implies to me both swapping out dirty pages and releasing
clean code/data pages).
I don't see how it would prevent a write lock on the fs. I don't see how it
has anything to do with writing to the filesystem.
Someone please correct me. I'm curious about this.
Btw, torque doesn't use mlock/mlockall, it uses plock().
> I'm going to try using --disable-plock-deamons.
>
> By the way, what is the default value ? Disabled ? Enabled=0 ?
The configure script definitely defaults to disabled:
# Check whether --enable-plock_daemons or --disable-plock_daemons was given.
if test "${enable_plock_daemons+set}" = set; then
enableval="$enable_plock_daemons"
case "${enableval}" in
yes) PLOCK_DAEMONS=7 ;;
no) PLOCK_DAEMONS=0 ;;
*) PLOCK_DAEMONS="${enableval}" ;;
esac
else
PLOCK_DAEMONS=0
fi
> --- Rick Ingham, Expeditors Int'l / IS
> ---- RICK.INGHAM at EXPEDITORS.COM (206) 674-3400 x3284 FAX 246-3197
>
>
>
> Rick
> Ingham/IS/Expedit
> ors To
> torqueusers at supercluster.org
> 07/29/2005 09:55 cc
> AM
> Subject
> File system "snapshot" fails when
> using Torque
>
>
>
>
>
>
>
>
>
> We have been using OpenPBS on 100+ servers (standalone systems with server,
> scheduler, and mom) for many months on Sun Solaris 9 Sparc systems. Last
> week we deployed Torque 1.2.0p5 to five servers. Since then, the file
> system snapshot (snapfs) of the /usr mount point has been failing with
> every daily backup.
>
> Our PBS programs are installed in:
> /usr/local/sbin
> /usr/local/bin
>
> PBSHOME is:
> /var/spool/PBS
>
> I have not been able to explain why the snapshot is getting wrapped around
> the axle. 'lsof' shows nothing on /usr that the pbs_* daemons have open or
> locked.
>
> Any ideas? I'm stumped.
>
> --- Rick Ingham, Expeditors Int'l / IS
> ---- RICK.INGHAM at EXPEDITORS.COM (206) 674-3400 x3284 FAX 246-3197
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050803/cc0831d7/attachment.bin
More information about the torqueusers
mailing list