[torqueusers] TORQUE 2.1.6 and 2.0.0p11 are released

Lennart Karlsson Lennart.Karlsson at nsc.liu.se
Fri Oct 27 07:17:16 MDT 2006

> TORQUE 2.1.6 is now available.  This fixes all known security problems
> and regressions from Friday's release.  TM on single-node jobs and job
> reruns are working again.

I have now installed it on four of our clusters and found
something that has broken: The momrestart facility.

As usual, I install the new pbs_mom binaries (with 'make install')
and when some node has finished all its jobs, the pbs_mom on that
node logs e.g.

pbs_mom;Job;368028.moonwatch;job was terminated
pbs_mom;Svr;pbs_mom;Will be restarting: /usr/pbs/sbin/pbs_mom
pbs_mom;Svr;pbs_mom;Is down
pbs_mom;Svr;Log;Log closed

and there is no pbs_mom process any longer. Big surprise! My
clusters are closing down automatically in this way, node by
node. :-(

I was very happy for the automatic momrestarts as long as the worked.
Perhaps they can be repaired? For now I unconfigure them.

-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
   National Supercomputer Centre in Linkoping, Sweden

More information about the torqueusers mailing list