[torqueusers] pbs_mom unable to chdir to automounted dirs

Mary Ellen Fitzpatrick mfitzpat at bu.edu
Wed Oct 22 12:19:23 MDT 2008


Thanks Luke.
Right now, my cluster is one node, with additional 50 to be brought 
on-line once I resolve the automount problem.  The job I am running is 
very simple, no nfs load on server.

my $usecp I believe is correct and works properly after the nfs dir is 
mounted. 

$usecp *:/fs/userB1 /fs/userB1

My auto.home file:
userB1  -rw,hard,intr   userB:/userB/u1

auto.master file:
#+auto.master
/fs     /etc/auto.home

I believe it is an automount issue and I need to tweak a parameter in a 
config file.  Not sure which one it is at this point.


Luke Scharf wrote:
> Mary Ellen Fitzpatrick wrote:
>> I have my home dirs nfs exported to all of my compute nodes.  I can 
>> log into the nodes and cd the nfs mounted dirs, no problem. When I 
>> submit a job to a node and the automounted nfs dirs are not mount 
>> (timed out), I get the following error:
>>
>> Oct 21 16:08:14 node1047 pbs_mom: No such file or directory (2) in 
>> TMomFinalizeChild, PBS: chdir to '/fs/userB1/mfitzpat' failed: No 
>> such file or directory
>>
>> If I immediately resubmit the job to the same node, it will run.  It 
>> appears that pbs wants the automounted nfs dirs to be already 
>> mounted, then the job will run.  If I hard mount the nfs home dirs, I 
>> have no problem running the jobs, but I do not want to do that.
>>
>> Any one run into this?  Trying to figure out if it is a torque issue 
>> or automount issue.
>
> How big is your cluster?  How capable is the NFS server?  A job-start 
> is likely to create a mountstorm, and generate a lot of I/O.  Some 
> servers can handle it, some can't.
>
> Yay for scaling issues!
>
> -Luke
>
> P.S. I second the suggestion of checking the $usecp value.

-- 
Thanks
Mary Ellen



More information about the torqueusers mailing list