[torquedev] recent torque changes

Glen Beane glen.beane at gmail.com
Sat Feb 16 22:56:16 MST 2008

2008/2/2 Garrick Staples <garrick at usc.edu>:

> On Sat, Feb 02, 2008 at 02:45:51AM -0500, Glen Beane alleged:
> > I've just checked in some changes into trunk that increase the
> > constant from 11 to 61.  This allows for 64 char .JB and .SC files on
> > pbs_server/pbs_mom
> >
> > the previous 14 char limit was too small when you combine large job
> sequence
> > nubmers and large job arrays - we just couldn't hash those 11 characters
> > enough to make the neessary number of unique file names.
> >
> > This should help the job arrays scale much better.
> >
> > .JB files with the old size for their jobbase array are automatically
> > upgraded when pbs_server starts
> > we used a similar auto upgrader from 2.1.x to 2.2.0.  the only down side
> is
> > if you upgrade to 2.3.x you wouldn't be able to recover your jobs if you
> > downgrade back to 2.2.x (they will be renamed as .BD files I think)
> Would it make sense to rename the existing files when pbs_server or
> pbs_mom
> restarts?

I have about 90% of the code done for this but I'm not convinced its worth
the trouble.  It isn't too difficult, but the biggest pain is this:

the code to do this should be contained in the version specific function
called by job_qs_upgrade.  job_qs_upgrade is called by job_recov, which is
passed a filename from pbsd_init.  By the time control is back into
pbsd_init the filename it has might no longer be valid.  If something went
wrong and job_recov failed, pbsd_init will try to rename this file, but the
filename it has is no longer valid and an error is logged and the "bad" job
file does not get the .BD extension and pbs_server will attempt to recover
it again if it is restarted.

it won't hurt anything to keep legacy filenames for jobs recovered and
upgraded, and the code to rename the files is growing much larger than I had
anticipated and the impact is extending beyond the version specific function
in job_qs_upgrade.c

I think I would rather leave this code out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20080217/834af9a9/attachment.html

More information about the torquedev mailing list