[torquedev] proposed change in directory structure
garrick at usc.edu
Fri Jul 11 14:43:53 MDT 2008
On Fri, Jul 11, 2008 at 04:28:43PM -0400, Glen Beane alleged:
> I've been working on some changes in trunk that transfer the .OU and .ER
> spool files from pbs_mom back to pbs_server. This is one of the steps we
> need to take so that a job in the COMPLETE state can be restarted from a
> checkpoint file. (the files are only returned to the server if
> keep_completed is positive and the job has a checkpoint file)
> There are problems when the spool file is shared between pbs_server and the
> mother superior pbs_mom. What happens is that when the files are "returned"
> pbs_server takes ownership of the .ER and .OU files in the spool dir and
> when pbs_mom forks to the user to copy the files back to the user home
> directory they are unable to do so because of a permission denied error. I
> feel that the cleanest solution is to just separate the pbs_server and
> pbs_mom spool directories. In my current working copy of trunk I have
> changed pbs_server to use server_home/server_spool instead of
> server_home/spool. pbs_mom continues to use server_home/spool. This solves
> my problems because when the spool files are returned to pbs_server pbs_mom
> retains its copy it its own spool directory. It is then free to fork to the
> user to copy the files and then delete them.
> Are there any objections to this change in trunk? (the change will be
> introduced with the release of TORQUE 2.4.0)
So we're doing a useless copy from server_home/spool to
server_home/server_spool? At my site, these files are often a significant
percentage of the filesystem. If a file is more than 50% of the total
filesystem, then this is going to fail.
Why not just have the server check if it already has the file and not issue a
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20080711/df10289b/attachment.bin
More information about the torquedev