[torqueusers] Upgrade from 2.1.*

Glen Beane glen.beane at gmail.com
Tue Aug 17 11:46:27 MDT 2010


On Tue, Aug 17, 2010 at 1:38 PM, Joshua Bernstein
<jbernstein at penguincomputing.com> wrote:
> David,
>
> As Sarah point out, there are problems with doing that large of an
> upgrade while jobs are running. As you know, the job data structure has
> changed quite a bit from 2.1 to 2.4, and thus currently running jobs are
> detected as corrupted, and generally unpredictable things seem to
> happen. More often then not, when pbs_server starts back up it sees
> these jobs, but isn't able to correctly process them, and thus they get
> lost or worse, sometimes get killed. My advice would be to do the
> upgrade only after draining the system of of jobs. It some cases when a
> complete drain isn't an option, I've done this using a rolling upgrade
> with two pbs_server and two sets of pbs_mom's running


I've been putting in code that can  "upgrade" the server's .JB files
for previous versions of torque.  Any time there is a change (that I
know about, sometimes other developers change the job structure and I
don't know about it) I add additional upgrade code to handle that
case, and I do limited testing but it isn't as robust as I would like,
or tested as extensively. I am pretty sure I tested upgrading from 2.1
to 2.2 and 2.3 with queued (but not running) jobs,  but I don't think
I ever did any further testing of upgrading 2.1.  Some developers have
been talking about the idea of an xml-based job file, which would
allow us to get around things like hard coded string lengths in the
job struct that sometimes need to be enlarged causing an
incompatibility between .JB files between torque versions, but of
course this would come at a performance penalty.


More information about the torqueusers mailing list