[torqueusers] Release of TORQUE 2.3.3

Josh Butikofer josh at clusterresources.com
Fri Aug 15 11:34:18 MDT 2008


We have released a new official version of TORQUE: 2.3.3. This release was made so soon after 2.3.2 
(which was only released two weeks ago) because we discovered a serious bug that could cause pbs_mom 
daemons to not properly reconnect to the pbs_server in cases of network disruption. For some users, 
this resulted in large numbers of compute nodes in a state of "down" and the only way to recover 
them was to manually restart the pbs_mom daemon.

If you are concerned about or experience this issue with TORQUE 2.3.2, but cannot upgrade yet to 
2.3.3, restarting the pbs_mom should bring the compute node back to full health.

A list of changes in 2.3.3 follows:

c - crash     b - bug fix    e - enhancement    f - new feature
   b - fixed bug where pbs_mom would sometimes not connect properly with pbs_server after network
   b - changed so run_pelog opens correct stdout/stderr when join is used
   b - corrected pbs_server man page for SIGUSR1 and SIGUSR2
   f - added new pbs_track command which may be used to launch an external process and a pbs_mom will
       then track the resource usage of that process and attach it to a specified job (experimental)
       (special thanks to David Singleton and David Houlder from APAC)
   e - added alternate method for sending cluster addresses to mom (compile with -DALT_CLSTR_ADDR)

Thanks again to all who have been helping out with TORQUE development, submitting bugs, answering 
questions, and giving feedback about 2.3.2!


Josh Butikofer
Cluster Resources, Inc.

More information about the torqueusers mailing list