[torquedev] Re: binary change to .JB files in 2.3-fixes branch!

Glen Beane glen.beane at gmail.com
Mon Mar 30 13:00:21 MDT 2009


Hi Josh,

Sorry to fly off the handle over the weekend :)

I can definitely live with adding minor enhancements to a bug-fix
branch if they don't affect anything until they have been configured.
So if I update from 2.3.6 to 2.3.7 the behavior does not change unless
I explicitly set some new qmgr configuration settings.  The scope of
these changes would have to be small enough that they can be
thoroughly tested to make sure they don't have any unintended
consequences.

It would definitely be good to document the process

-glen


On Mon, Mar 30, 2009 at 2:00 PM, Josh Butikofer
<josh at clusterresources.com> wrote:
> Glen and everyone else who's interested:
>
> First of all, let me explain what is going on with TORQUE 2.3. We have been
> developing a lot of enhancements and bug fixes in a separate 2.3 branch.
> Many of
> these fixes and enhancements we considered either very important or very
> beneficial. Because these changes have been used and tested thoroughly in
> several environments, we felt that a good number of them could be put into
> TORQUE 2.3.7. We started this migration process last week. We were planning
> on
> announcing all of these changes on the mailing list and asking
> developers/users
> to review/test them. This is akin to peer review and beta testing--and it
> looks
> like you've already started the peer review part. :-)
>
> We haven't released TORQUE 2.3.7 yet because, we know, that any change in
> the
> code increases the chance for bugs. We weren't planning on releasing 2.3.7
> without more testing and review from the community.
>
> Some changes that went into 2.3.7 last week during our huge merge operation
> shouldn't have. Some are because of oversights on our part. Some are simply
> accidental. For example, this HOSTNAME change was an oversight. Another
> change
> that is too significant for 2.3.7 is the change to sockets. This will be
> removed
> as well, but will be present in TORQUE 2.4. There may be others, and we will
> need to look at all of them again.
>
> Let me know if I'm wrong, but it seems the core of what you are suggesting
> is this:
>
> * TORQUE 2.3.x should not include anything but minor bug fixes.
> * All new features, enhancements, and more intrusive bug fixes should go
> into a
> non-stable branch, which is now called trunk.
>
> Up to this point, CRI's developers have been operating under a slightly
> different model:
>
> * TORQUE 2.3.x can include bug fixes, enhancements and features that can be
> easily turned off or on (do not affect default behavior). We want to keep
> the
> branch stable, but also feel it is important to continue to make the product
> better for users, without having to wait months and months for the next
> major
> release to come out.
>
> * "Trunk" (or TORQUE 2.4) can have pretty much any big or small change put
> into
> it that is deemed unfit for TORQUE 2.3.x. I've always felt that this is ...
> dangerous and unwieldy.
>
> It is obvious that there are differing opinions on how TORQUE's development
> should be handled. So it seems to me that we need to come up with some
> specific
> guidelines and hold everyone to them to avoid such situations in the future.
> But
> I think it is important that all the stakeholders get a say in those
> guidelines.
> We should also, perhaps, send a posting to the TORQUE development list
> before
> doing any check-ins or merges to double-check that our change is a worthy
> one,
> without any risks or concerns that we haven't thought of. What do you think?
>
> As you know, CRI has customers who depend on our ability to deliver
> important
> features and bug fixes to users in stable branches of TORQUE--in a timely
> manner. I also see that other customers or users just want TORQUE to remain
> stable and "not fixed, unless its broken." We need to find a way to satisfy
> both
> scenarios.
>
> Glen Beane wrote:
>>
>> the change
>>
>> #define PBS_MAXHOSTNAME  64 /* max host name length */
>>
>> to
>>
>> #define PBS_MAXHOSTNAME  1024 /* max host name length */
>>
>> results in a change in the size of the ji_qs struct, which is what is
>> saved in the .JB file.  This requires adding support for this upgrade
>> to job_qs_upgrade so existing .JB files get upgraded to the new struct
>> layout after a TORQUE upgrade, and it would be impossible to downgrade
>> to a previous 2.3 release without draining the system of running jobs.>
>> this is how the size of ji_jobid in the ji_qs struct is defined:
>>
>> #define PBS_MAXSERVERNAME PBS_MAXHOSTNAME /* max server name length */
>> #define PBS_MAXSVRJOBID  (PBS_MAXSEQNUM + PBS_MAXSERVERNAME +
>> PBS_MAXPORTNUM + PBS_MAXJOBARRAYLEN + 2 ) /* server job id size */
>>
>> This change _needs_ to be pulled out of 2.3-fixes. We should not be
>> making changes to this structure in "bug fix" releases.  I am going to
>> change this back to 64 in 2.3-fixes, and leave it as 1024 in trunk.
>
> I agree--it was a mistake to put this into TORQUE 2.3 due to the changing of
> the
> job structure. It is proper to leave this in trunk, since some users have
> hostnames that are larger than 64 bytes, and RFC's say hostnames can be up
> to
> 255 characters in length. Ideally, we should someday change the way TORQUE
> stores job info to make it less brittle.
>
>> Also, we really should not be adding new features into 2.3-fixes
>> (accounting_keep_days, log_keep_days, lock_file).
>
> Again, I think there is a differing philosophy here and we need to have some
> more discussion to decide what our guidelines will be.
>
> Regards,
>
> Josh Butikofer
>
>


More information about the torquedev mailing list