[torqueusers] A possibly useful Torque hint

Chris Samuel csamuel at vpac.org
Tue Oct 26 00:48:01 MDT 2004


Here's something we've found today at VPAC that may be of use to some folks.

We had a couple of nodes for whom the installer hadn't set the NTP daemon to 
start on boot and had been powered down for a while due to hardware problems.  
We've recently brought these back on line and then after a day or so had 
noticed that the clocks had drifted significantly whilst they were off, so we 
restarted NTP.

After a while we noticed that Torque's pbs_server had flagged two nodes as 
being down (as seen by 'pbsnodes -l'), but the pbs_mom was still running on 
the node.

Looking at the mom's with momctl -d3 we saw:

Last Msg To Server:    -25873 seconds

Of course the NTP start script had reset the time and the mom was stuck in a 
time warp. :-)

Restarting the mom fixed the problem.

cheers,
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041026/bcfec9b9/attachment.bin


More information about the torqueusers mailing list