[torqueusers] Re: pbs_mom caches last healthcheck script error ? (Re: [Moabusers] Moab keeps on trying after pbs_mom rejects.)

Chris Samuel csamuel at vpac.org
Mon Dec 4 16:38:46 MST 2006


On Tuesday 05 December 2006 10:28, Garrick Staples wrote:

> Then your health check script is returning the error.

But it isn't, that's why we're bemused - we can run the script until we're 
blue in the face and it doesn't return anything at all!

# /usr/local/sbin/moab-check-health.sh
#

The other nodes in the cluster are all fine, and they're running the same 
script.

Hmm, hang on a tic..

ARGH!  %^!&#r^(*#%^ Fedora.

The two "special" nodes that have this problem are running FC6 (for hardware 
reasons), the rest are running FC5.

If you run the script as root then you get the above (fine) response.

If you run the script as a normal user you get a message about it not being 
able to find lspci, and so the script was generating the message when the 
grep for the characteristic that said the card was in 64-bit mode wasn't 
finding it!

For some reason this is only happening on the FC6 nodes, no idea why..

Brett's fixed his script to have full paths to the commands and they've come 
back online quite happily!

Open mouth, remove foot..

Sorry about this Garrick!

cheers,
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20061205/4508fc09/attachment.bin


More information about the torqueusers mailing list