[torqueusers] Problem with Torque with AMD Opteron and RHEL 3

Leandro Tavares Carneiro leandro at ep.petrobras.com.br
Wed Dec 8 07:20:58 MST 2004


I will try, but this is not a option to me... I have thousands of users 
with their home area splited over some NetApp fillers and i think i 
*cant* do that! Maybe for an user for a test, but for all my environment 
is impossible....

Anyway, this only happens with this little cluster, and this cluster is 
not for a "real production" is now more dedicated to develpment of the 
applications for this achiteture. This will be production in somewere in 
near future, but not now.

Anyway, thanks for the help,

Regards,

Leandro Tavares Carneiro
Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
Tel: (0xx21) 2534-1427


Bas van der Vlies wrote:
> We at SARA have the same problem. I have turned on root_squash. The 
> problem disappeared it i made my home directory 755. But that is not an
> real soltion. we are using torque 1.1.0p4
> 
>         Regards
> 
> Leandro Tavares Carneiro wrote:
> 
>> Chris,
>>
>> I can see the home directory of all users, but i dont have it exported 
>> with no_root_squas parameter because we dont need it before, and this 
>> home area is served by some NetApp fillers to the users.
>>
>> We have here other clusters with a much larger nodes and we never had 
>> this problem. The other cluster are Xeon and the OS is the old RedHat. 
>> This problem only happen with this Opteron/RHEL WS cluster.
>>
>> Thanks for your help,
>>
>> Regards,
>>
>> Leandro Tavares Carneiro
>> Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
>> Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
>> Tel: (0xx21) 2534-1427
>>
>>
>> Chris Samuel wrote:
>>
>>> On Tue, 7 Dec 2004 10:18 pm, Leandro Tavares Carneiro wrote:
>>>
>>>
>>>>        I have checked everything in my nodes and server and is 
>>>> everything
>>>> OK. All the nodes can recognize the user id i'm using and the home
>>>> directory is mounting, but i still got this error.
>>>>
>>>> Dec  7 09:04:32 node002 pbs_mom: scan_for_exiting, cannot chdir to user
>>>> home directory
>>>
>>>
>>>
>>>
>>> Are you exporting the users home directories with no_root_squash from 
>>> the NFS server ?
>>>
>>> Easiest way to check that is to login to node002 as root and then try 
>>> and cd to the users home directory - if you get a permission denied 
>>> error this is probably what's going on.
>>>
>>> A number of folks have reported this recently, it doesn't affect us 
>>> here as we're exporting with no_root_squash (we have total control 
>>> over all clients and server).
>>>
>>> The other time we've seen this is after an NFS server crash when the 
>>> clients have stale NFS file handles, again trying the above should 
>>> tell you.
>>>
>>> It would be very nice if the pbs_mom reported the value of errno and 
>>> its sys_errlist equivalent. :-)
>>>
>>> cheers,
>>> Chris
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://supercluster.org/mailman/listinfo/torqueusers
>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 


More information about the torqueusers mailing list