[torqueusers] Missing home directory causes pbs_mom to crash ?

Chris Samuel csamuel at vpac.org
Wed Oct 27 21:29:39 MDT 2004


One thing we see occasionally (when a node looses access to home directories 
due to stale NFS file handles for instance) any job submitted to an affected 
node will cause it to crash, and in the logs we see:

10/28/2004 09:19:42;0001;   pbs_mom;Svr;pbs_mom;No such file or directory (2) 
in fork_to_user, invalid home directory '/home/san02/rajdas' specified, 
errno=2 (No such file or directory)
10/28/2004 09:19:42;0080;   pbs_mom;Req;req_reject;Reject reply code=15035
( MSG=invalid home directory '/home/san02/rajdas' specified, errno=2 (No such 
file or directory)), aux=0, type=54, from PBS_Server at mgtnode

This seems to be the similar to the error that Roy is seeing, though not 
identical, but we don't use automounters so we see it only when we're having 
NFS problems.

This is the same snapshot of Torque as Roy's (torque-1.1.0p4-snap.1098376627).

The odd thing is that ours crashes the mom whilst his doesn't, and that may be 
because on his systems his automounter has mounted the directory whilst on 
ours it's still absent.

I'm trying to work further on making a reproducible test case for a crash.

cheers!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041028/d808b336/attachment.bin


More information about the torqueusers mailing list