[torqueusers] Transparently upgrading Torque?

Michael Durket durket at hw-durket.stanford.edu
Mon Nov 27 15:32:07 MST 2006


I'm trying to upgrade our old torque-1.0.1p5 installation
to torque-2.1.6. Since the current installation is running
our production workload constantly, I've installed the new
version on a separate machine. I ran into a problem in that
the older MOMs won't talk to multiple servers, so I installed
a new MOM on one of our unused execution nodes. It will (obviously)
talk to the new release of torque, but appears to have problems
with the older torque, generating these messages in the torque
log:

11/27/2006 14:26:23;0001;PBS_Server;Svr;PBS_Server;Success (0) in stream_eof, 
connection to xyz dropped.  setting node state to down in stream_eof

which seems due to the MOM on node xyz doing this:

11/27/2006 14:26:23;0002;   pbs_mom;n/a;mom_main;connection to server abc timeout
11/27/2006 14:26:23;0002;   pbs_mom;n/a;mom_main;hello sent to server abc

I'm assuming this is actually a problem in the PBS server code (since it seems 
to be reporting "Success" as if it were an error). Is there any way around 
this (short of fixing the old code)? My problem is that I need to keep the 
old system running simultaneously with testing - I can't test the new torque
with the old MOMs and it now appears the old torque won't work with the new 
MOMs.



More information about the torqueusers mailing list