[torqueusers] Transparently upgrading Torque?

Garrick Staples garrick at clusterresources.com
Tue Nov 28 00:32:45 MST 2006


On Mon, Nov 27, 2006 at 02:32:07PM -0800, Michael Durket alleged:
> I'm trying to upgrade our old torque-1.0.1p5 installation
> to torque-2.1.6. Since the current installation is running
> our production workload constantly, I've installed the new
> version on a separate machine. I ran into a problem in that
> the older MOMs won't talk to multiple servers, so I installed
> a new MOM on one of our unused execution nodes. It will (obviously)
> talk to the new release of torque, but appears to have problems
> with the older torque, generating these messages in the torque
> log:
> 
> 11/27/2006 14:26:23;0001;PBS_Server;Svr;PBS_Server;Success (0) in stream_eof, 
> connection to xyz dropped.  setting node state to down in stream_eof
> 
> which seems due to the MOM on node xyz doing this:
> 
> 11/27/2006 14:26:23;0002;   pbs_mom;n/a;mom_main;connection to server abc timeout
> 11/27/2006 14:26:23;0002;   pbs_mom;n/a;mom_main;hello sent to server abc
> 
> I'm assuming this is actually a problem in the PBS server code (since it seems 
> to be reporting "Success" as if it were an error). Is there any way around 
> this (short of fixing the old code)? My problem is that I need to keep the 
> old system running simultaneously with testing - I can't test the new torque
> with the old MOMs and it now appears the old torque won't work with the new 
> MOMs.

The protocol has changed since 1.0, they just won't talk to each other.



More information about the torqueusers mailing list