[torqueusers] OpenPBS to Torque upgrade

David Jackson jacksond at clusterresources.com
Tue Mar 8 15:29:35 MST 2005


Chris, Steve,

  We have found no reason within the code that a change in communication
protocol would impact queued workload.  As Chris mentioned, you should
be able to shut down all TORQUE components, rebuild, and then start
TORQUE.  Queued workload should pick up where you left off.  The next
time you upgrade from 1.2.x to a later release, you may even be able to
save most or all running jobs thanks to USC's MOM enhancements. 

  Please let us know what you find.

Dave

On Tue, 2005-03-08 at 10:11 +1100, Chris Samuel wrote:
> On Tue, 8 Mar 2005 02:30 am, Steve Traylen wrote:
> 
> > Another migration question if moving from a torque (torque-1.0.1p6) with
> > the default '--enable-rpp' to the (torque-1.2.0p1) with '--disable-rpp'
> > then what are the potential pitfalls with that. I'm less worried about
> > the torque upgrade itself but changing the protocol would I assume
> > be significant.
> 
> Hmm, that's something I'm not sure about.   We've been upgrading from time to 
> time and sometimes that's been with the system running and sometimes that's 
> been when we've had electrical power work about to occur and have had to shut 
> the cluster down.  I can't place when that change happened in the grand 
> scheme of things so I'm not in a position to comment authoritatively.
> 
> > Should I drain running, queued jobs or both? Can I just upgrade everything
> > and restart everything and will everything be happy?
> 
> I would have thought that the change in the protocol between the MOM's and the 
> server will require restarting all the components, and if you have some users 
> using Pete Wyckoff's mpiexec to launch parallel MPI jobs (as we do) then 
> those would certainly be adversely affected by this.
> 
> However, we've always upgraded with queued jobs waiting, and the only time 
> that this has bitten us was with the change to the length of the PBS job ID.
> 
> Of course, I have to disclaim all liability for this information, caveat 
> emptor, batteries not included, if it breaks you get to keep both pieces, 
> don't blame me if you loose all your queued jobs or your cluster develops 
> emergent behaviour and takes over the world...
> 
> In short, the SuperCluster developers would be more helpful than me on 
> this. ;-)
> 
> Good luck!
> Chris
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list