[torqueusers] Problem with Torque with AMD Opteron and RHEL 3

Valery Mitsyn vvm at mammoth.jinr.ru
Thu Dec 9 10:01:51 MST 2004


Did you try to create /etc/hosts.equiv and put server and all
interactive nodes (FQND, official and IPaddr) to it?

On Thu, 9 Dec 2004, Leandro Tavares Carneiro wrote:

> Bas,
>
> I have tried to change te permissions of an home directory for an user to 777
> and the behavior is the same, but it is worst with the p5 snapshot.
>
> With p3, which is the version is working on the other clusters we have here,
> with the same users, i can run a job with one machine. It works, but when i
> put more than one, it dosent work....
>
> I have done some tests using local user accounts and it works. And, i have
> exported an home area for this user from an linux server *without* the
> no_root_squash parameter. By the way, i have user root_squash to enforce that
> and it works correctly.
>
> I think the problem is in another place, and this of chmod the home area or
> export with no_root_squase a coincidence.
>
> I hope someone can help me. I'm in trouble because that cluster.
>
> Thanks for your help,
>
> Regards,
>
> Leandro Tavares Carneiro
> Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
> Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
> Tel: (0xx21) 2534-1427
>
>
> Bas van der Vlies wrote:
> > Dave Jackson wrote:
> >> Bas,
> >>
> >>   This should be easy to patch but we have so far been unable to
> >> reproduce it in our lab with or without root squash.  If any site can
> >> reliably reproduce it and is able to work with us, we can most likely
> >> correct this today.
> >>
> >
> > Dave,
> >
> >  It is easily to reproduce for me. Just chmod 700 my homedir directory.
> >  Or must i try the new p5 snapshot on on node.
> >
> >
> >  We have an timezone difference ;-)
> >
> >
> >>
> >> On Wed, 2004-12-08 at 03:58, Bas van der Vlies wrote:
> >>
> >>> We at SARA have the same problem. I have turned on root_squash. The
> >>> problem disappeared it i made my home directory 755. But that is not an
> >>> real soltion. we are using torque 1.1.0p4
> >>>
> >>>         Regards
> >>>
> >>> Leandro Tavares Carneiro wrote:
> >>>
> >>>> Chris,
> >>>>
> >>>> I can see the home directory of all users, but i dont have it
> >>>> exported with no_root_squas parameter because we dont need it
> >>>> before, and this home area is served by some NetApp fillers to the
> >>>> users.
> >>>>
> >>>> We have here other clusters with a much larger nodes and we never
> >>>> had this problem. The other cluster are Xeon and the OS is the old
> >>>> RedHat. This problem only happen with this Opteron/RHEL WS cluster.
> >>>>
> >>>> Thanks for your help,
> >>>>
> >>>> Regards,
> >>>>
> >>>> Leandro Tavares Carneiro
> >>>> Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
> >>>> Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
> >>>> Tel: (0xx21) 2534-1427
> >>>>
> >>>>
> >>>> Chris Samuel wrote:
> >>>>
> >>>>
> >>>>> On Tue, 7 Dec 2004 10:18 pm, Leandro Tavares Carneiro wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>       I have checked everything in my nodes and server and is
> >>>>>> everything
> >>>>>> OK. All the nodes can recognize the user id i'm using and the home
> >>>>>> directory is mounting, but i still got this error.
> >>>>>>
> >>>>>> Dec  7 09:04:32 node002 pbs_mom: scan_for_exiting, cannot chdir to
> >>>>>> user
> >>>>>> home directory
> >>>>>
> >>>>>
> >>>>>
> >>>>> Are you exporting the users home directories with no_root_squash
> >>>>> from the NFS server ?
> >>>>>
> >>>>> Easiest way to check that is to login to node002 as root and then
> >>>>> try and cd to the users home directory - if you get a permission
> >>>>> denied error this is probably what's going on.
> >>>>>
> >>>>> A number of folks have reported this recently, it doesn't affect us
> >>>>> here as we're exporting with no_root_squash (we have total control
> >>>>> over all clients and server).
> >>>>>
> >>>>> The other time we've seen this is after an NFS server crash when
> >>>>> the clients have stale NFS file handles, again trying the above
> >>>>> should tell you.
> >>>>>
> >>>>> It would be very nice if the pbs_mom reported the value of errno
> >>>>> and its sys_errlist equivalent. :-)
> >>>>>
> >>>>> cheers,
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>> ------------------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> torqueusers mailing list
> >>>>> torqueusers at supercluster.org
> >>>>> http://supercluster.org/mailman/listinfo/torqueusers
> >>>>
> >>>> _______________________________________________
> >>>> torqueusers mailing list
> >>>> torqueusers at supercluster.org
> >>>> http://supercluster.org/mailman/listinfo/torqueusers
> >>>
> >>
> >
> >
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers
>

Best regards,
 Valery Mitsyn


More information about the torqueusers mailing list