[torqueusers] pbs_mom unable to chdir to automounted dirs
Mary Ellen Fitzpatrick
mfitzpat at bu.edu
Wed Oct 22 12:19:23 MDT 2008
Thanks Luke.
Right now, my cluster is one node, with additional 50 to be brought
on-line once I resolve the automount problem. The job I am running is
very simple, no nfs load on server.
my $usecp I believe is correct and works properly after the nfs dir is
mounted.
$usecp *:/fs/userB1 /fs/userB1
My auto.home file:
userB1 -rw,hard,intr userB:/userB/u1
auto.master file:
#+auto.master
/fs /etc/auto.home
I believe it is an automount issue and I need to tweak a parameter in a
config file. Not sure which one it is at this point.
Luke Scharf wrote:
> Mary Ellen Fitzpatrick wrote:
>> I have my home dirs nfs exported to all of my compute nodes. I can
>> log into the nodes and cd the nfs mounted dirs, no problem. When I
>> submit a job to a node and the automounted nfs dirs are not mount
>> (timed out), I get the following error:
>>
>> Oct 21 16:08:14 node1047 pbs_mom: No such file or directory (2) in
>> TMomFinalizeChild, PBS: chdir to '/fs/userB1/mfitzpat' failed: No
>> such file or directory
>>
>> If I immediately resubmit the job to the same node, it will run. It
>> appears that pbs wants the automounted nfs dirs to be already
>> mounted, then the job will run. If I hard mount the nfs home dirs, I
>> have no problem running the jobs, but I do not want to do that.
>>
>> Any one run into this? Trying to figure out if it is a torque issue
>> or automount issue.
>
> How big is your cluster? How capable is the NFS server? A job-start
> is likely to create a mountstorm, and generate a lot of I/O. Some
> servers can handle it, some can't.
>
> Yay for scaling issues!
>
> -Luke
>
> P.S. I second the suggestion of checking the $usecp value.
--
Thanks
Mary Ellen
More information about the torqueusers
mailing list