[torqueusers] A possibly useful Torque hint
Chris Samuel
csamuel at vpac.org
Tue Oct 26 00:48:01 MDT 2004
Here's something we've found today at VPAC that may be of use to some folks.
We had a couple of nodes for whom the installer hadn't set the NTP daemon to
start on boot and had been powered down for a while due to hardware problems.
We've recently brought these back on line and then after a day or so had
noticed that the clocks had drifted significantly whilst they were off, so we
restarted NTP.
After a while we noticed that Torque's pbs_server had flagged two nodes as
being down (as seen by 'pbsnodes -l'), but the pbs_mom was still running on
the node.
Looking at the mom's with momctl -d3 we saw:
Last Msg To Server: -25873 seconds
Of course the NTP start script had reset the time and the mom was stuck in a
time warp. :-)
Restarting the mom fixed the problem.
cheers,
Chris
--
Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041026/bcfec9b9/attachment.bin
More information about the torqueusers
mailing list