[torqueusers] Problem with Torque with AMD Opteron and RHEL 3

Leandro Tavares Carneiro leandro at ep.petrobras.com.br
Thu Dec 9 03:49:46 MST 2004


Bas,

I have tried to change te permissions of an home directory for an user to 777
and the behavior is the same, but it is worst with the p5 snapshot.

With p3, which is the version is working on the other clusters we have here,
with the same users, i can run a job with one machine. It works, but when i
put more than one, it dosent work....

I have done some tests using local user accounts and it works. And, i have 
exported an home area for this user from an linux server *without* the 
no_root_squash parameter. By the way, i have user root_squash to enforce that 
and it works correctly.

I think the problem is in another place, and this of chmod the home area or 
export with no_root_squase a coincidence.

I hope someone can help me. I'm in trouble because that cluster.

Thanks for your help,

Regards,

Leandro Tavares Carneiro
Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
Tel: (0xx21) 2534-1427


Bas van der Vlies wrote:
> Dave Jackson wrote:
>> Bas,
>>
>>   This should be easy to patch but we have so far been unable to
>> reproduce it in our lab with or without root squash.  If any site can
>> reliably reproduce it and is able to work with us, we can most likely
>> correct this today.
>>
> 
> Dave,
> 
>  It is easily to reproduce for me. Just chmod 700 my homedir directory.
>  Or must i try the new p5 snapshot on on node.
> 
> 
>  We have an timezone difference ;-)
> 
> 
>>
>> On Wed, 2004-12-08 at 03:58, Bas van der Vlies wrote:
>>
>>> We at SARA have the same problem. I have turned on root_squash. The 
>>> problem disappeared it i made my home directory 755. But that is not an
>>> real soltion. we are using torque 1.1.0p4
>>>
>>>         Regards
>>>
>>> Leandro Tavares Carneiro wrote:
>>>
>>>> Chris,
>>>>
>>>> I can see the home directory of all users, but i dont have it 
>>>> exported with no_root_squas parameter because we dont need it 
>>>> before, and this home area is served by some NetApp fillers to the 
>>>> users.
>>>>
>>>> We have here other clusters with a much larger nodes and we never 
>>>> had this problem. The other cluster are Xeon and the OS is the old 
>>>> RedHat. This problem only happen with this Opteron/RHEL WS cluster.
>>>>
>>>> Thanks for your help,
>>>>
>>>> Regards,
>>>>
>>>> Leandro Tavares Carneiro
>>>> Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
>>>> Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
>>>> Tel: (0xx21) 2534-1427
>>>>
>>>>
>>>> Chris Samuel wrote:
>>>>
>>>>
>>>>> On Tue, 7 Dec 2004 10:18 pm, Leandro Tavares Carneiro wrote:
>>>>>
>>>>>
>>>>>
>>>>>>       I have checked everything in my nodes and server and is 
>>>>>> everything
>>>>>> OK. All the nodes can recognize the user id i'm using and the home
>>>>>> directory is mounting, but i still got this error.
>>>>>>
>>>>>> Dec  7 09:04:32 node002 pbs_mom: scan_for_exiting, cannot chdir to 
>>>>>> user
>>>>>> home directory
>>>>>
>>>>>
>>>>>
>>>>> Are you exporting the users home directories with no_root_squash 
>>>>> from the NFS server ?
>>>>>
>>>>> Easiest way to check that is to login to node002 as root and then 
>>>>> try and cd to the users home directory - if you get a permission 
>>>>> denied error this is probably what's going on.
>>>>>
>>>>> A number of folks have reported this recently, it doesn't affect us 
>>>>> here as we're exporting with no_root_squash (we have total control 
>>>>> over all clients and server).
>>>>>
>>>>> The other time we've seen this is after an NFS server crash when 
>>>>> the clients have stale NFS file handles, again trying the above 
>>>>> should tell you.
>>>>>
>>>>> It would be very nice if the pbs_mom reported the value of errno 
>>>>> and its sys_errlist equivalent. :-)
>>>>>
>>>>> cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------ 
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> torqueusers mailing list
>>>>> torqueusers at supercluster.org
>>>>> http://supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://supercluster.org/mailman/listinfo/torqueusers
>>>
>>
> 
> 



More information about the torqueusers mailing list