[torquedev] Re: proposed change in directory structure

Glen Beane glen.beane at gmail.com
Mon Jul 14 11:55:29 MDT 2008


On Mon, Jul 14, 2008 at 1:44 PM, Scott Jackson <scott at clusterresources.com>
wrote:

> On Fri, 2008-07-11 at 16:28 -0400, Glen Beane wrote:
> > I've been working on some changes in trunk that transfer the .OU
> > and .ER spool files from pbs_mom back to pbs_server. This is one of
> > the steps we need to take so that a job in the COMPLETE state can be
> > restarted from a checkpoint file.  (the files are only returned to the
> > server if keep_completed is positive and the job has a checkpoint
> > file)
> >
> > There are problems when the spool file is shared between pbs_server
> > and the mother superior pbs_mom. What happens is that when the files
> > are "returned" pbs_server takes ownership of the .ER and .OU files in
> > the spool dir and when pbs_mom forks to the user to copy the files
> > back to the user home directory they are unable to do so because of a
> > permission denied error.  I feel that the cleanest solution is to just
> > separate the pbs_server and pbs_mom spool directories.  In my current
> > working copy of trunk I have changed pbs_server to use
> > server_home/server_spool instead of server_home/spool.  pbs_mom
> > continues to use server_home/spool.  This solves my problems because
> > when the spool files are returned to pbs_server pbs_mom retains its
> > copy it its own spool directory. It is then free to fork to the user
> > to copy the files and then delete them.
> >
> > Are there any objections to this change in trunk? (the change will be
> > introduced with the release of TORQUE 2.4.0)
> >
>
>
> No objections from me. This seems like a good approach. Personally, if I
> were the architect, I would have a mom, server and sched dir and under
> these, I would have log,spool,priv and other such directories. I know it
> is a big change. For me, the price of progress is worth it. It would
> have to be done in a minor version change (such as 2.3 to 2.4) and would
> have to be announced ostentatiously in the release notes.



what about the waisted disk space?  if pbs_mom and pbs_server are on the
same node then pbs_server will get its own copy, and then moments later
pbs_mom will delete its copy, so for that short period of time we have two
copies of the spool files and we have the wasted time of doing the pbs_mom
to pbs_server file transfer...

I think we can get around that,  but the code is going to be a bit of a
hack, which is why I originally suggested separate spool directories and no
differentiation between pbs_server/pbs_mom running on the same host or
different hosts
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20080714/afa54753/attachment.html


More information about the torquedev mailing list