[torqueusers] pbs_mom unable to chdir to automounted dirs

Mary Ellen Fitzpatrick mfitzpat at bu.edu
Wed Oct 22 12:37:07 MDT 2008

Yeah, that is why I am stumped...   because I can cd to nfs dirs, seems 
like autofs is working correctly.  But unless the nfs dir is 
pre-mounted, pbs_mom can not find it.  Very strange...

Yes, getent passwd give the correct home dir info
[root at node1048 mom_priv]# getent passwd

Luke Scharf wrote:
> Nothing that you mention looks amiss at first glance...
> Does the "getent passwd" information for the user have a correct home 
> directory on the node?
> -Luke
> Mary Ellen Fitzpatrick wrote:
>> Thanks Luke.
>> Right now, my cluster is one node, with additional 50 to be brought 
>> on-line once I resolve the automount problem.  The job I am running 
>> is very simple, no nfs load on server.
>> my $usecp I believe is correct and works properly after the nfs dir 
>> is mounted.
>> $usecp *:/fs/userB1 /fs/userB1
>> My auto.home file:
>> userB1  -rw,hard,intr   userB:/userB/u1
>> auto.master file:
>> #+auto.master
>> /fs     /etc/auto.home
>> I believe it is an automount issue and I need to tweak a parameter in 
>> a config file.  Not sure which one it is at this point.
>> Luke Scharf wrote:
>>> Mary Ellen Fitzpatrick wrote:
>>>> I have my home dirs nfs exported to all of my compute nodes.  I can 
>>>> log into the nodes and cd the nfs mounted dirs, no problem. When I 
>>>> submit a job to a node and the automounted nfs dirs are not mount 
>>>> (timed out), I get the following error:
>>>> Oct 21 16:08:14 node1047 pbs_mom: No such file or directory (2) in 
>>>> TMomFinalizeChild, PBS: chdir to '/fs/userB1/mfitzpat' failed: No 
>>>> such file or directory
>>>> If I immediately resubmit the job to the same node, it will run.  
>>>> It appears that pbs wants the automounted nfs dirs to be already 
>>>> mounted, then the job will run.  If I hard mount the nfs home dirs, 
>>>> I have no problem running the jobs, but I do not want to do that.
>>>> Any one run into this?  Trying to figure out if it is a torque 
>>>> issue or automount issue.
>>> How big is your cluster?  How capable is the NFS server?  A 
>>> job-start is likely to create a mountstorm, and generate a lot of 
>>> I/O.  Some servers can handle it, some can't.
>>> Yay for scaling issues!
>>> -Luke
>>> P.S. I second the suggestion of checking the $usecp value.

Mary Ellen

More information about the torqueusers mailing list