[torqueusers] Problem with Torque with AMD Opteron and RHEL 3

Leandro Tavares Carneiro leandro at ep.petrobras.com.br
Mon Dec 20 06:26:34 MST 2004


Hi,

I'm writing to told i have solved the problem here, in an unusual way....

At the moment, all my tests have runned with p5 without any problems.

The Solution? Downgrade the OS version, from Update 3 to Update 2 of RedHat WS 3!

Thats it. Thanks all.

Leandro Tavares Carneiro
Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
Tel: (0xx21) 2534-1427


Leandro Tavares Carneiro wrote:
> Well, we here use /etc/hosts.equiv for a long time, and we never need to 
> put
> the FQDN or IP address. I use only the short name of each node and server.
> 
> But, just in case, i made a test without sucess.... I think now i have 
> to go to another way. I will try *older* versions of Torque to see whats 
> happens. And, if it didint change the results, i will try *ancient* 
> versions.....
> 
> Well, whish me luck!
> 
> Best Regards,
> 
> Leandro Tavares Carneiro
> Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
> Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
> Tel: (0xx21) 2534-1427
> 
> 
> Valery Mitsyn wrote:
>> Did you try to create /etc/hosts.equiv and put server and all
>> interactive nodes (FQND, official and IPaddr) to it?
>>
>> On Thu, 9 Dec 2004, Leandro Tavares Carneiro wrote:
>>
>>> Bas,
>>>
>>> I have tried to change te permissions of an home directory for an 
>>> user to 777
>>> and the behavior is the same, but it is worst with the p5 snapshot.
>>>
>>> With p3, which is the version is working on the other clusters we 
>>> have here,
>>> with the same users, i can run a job with one machine. It works, but 
>>> when i
>>> put more than one, it dosent work....
>>>
>>> I have done some tests using local user accounts and it works. And, i 
>>> have
>>> exported an home area for this user from an linux server *without* the
>>> no_root_squash parameter. By the way, i have user root_squash to 
>>> enforce that
>>> and it works correctly.
>>>
>>> I think the problem is in another place, and this of chmod the home 
>>> area or
>>> export with no_root_squase a coincidence.
>>>
>>> I hope someone can help me. I'm in trouble because that cluster.
>>>
>>> Thanks for your help,
>>>
>>> Regards,
>>>
>>> Leandro Tavares Carneiro
>>> Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
>>> Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
>>> Tel: (0xx21) 2534-1427
>>>
>>>
>>> Bas van der Vlies wrote:
>>>> Dave Jackson wrote:
>>>>> Bas,
>>>>>
>>>>>  This should be easy to patch but we have so far been unable to
>>>>> reproduce it in our lab with or without root squash.  If any site can
>>>>> reliably reproduce it and is able to work with us, we can most likely
>>>>> correct this today.
>>>>>
>>>> Dave,
>>>>
>>>> It is easily to reproduce for me. Just chmod 700 my homedir directory.
>>>> Or must i try the new p5 snapshot on on node.
>>>>
>>>>
>>>> We have an timezone difference ;-)
>>>>
>>>>
>>>>> On Wed, 2004-12-08 at 03:58, Bas van der Vlies wrote:
>>>>>
>>>>>> We at SARA have the same problem. I have turned on root_squash. The
>>>>>> problem disappeared it i made my home directory 755. But that is 
>>>>>> not an
>>>>>> real soltion. we are using torque 1.1.0p4
>>>>>>
>>>>>>        Regards
>>>>>>
>>>>>> Leandro Tavares Carneiro wrote:
>>>>>>
>>>>>>> Chris,
>>>>>>>
>>>>>>> I can see the home directory of all users, but i dont have it
>>>>>>> exported with no_root_squas parameter because we dont need it
>>>>>>> before, and this home area is served by some NetApp fillers to the
>>>>>>> users.
>>>>>>>
>>>>>>> We have here other clusters with a much larger nodes and we never
>>>>>>> had this problem. The other cluster are Xeon and the OS is the old
>>>>>>> RedHat. This problem only happen with this Opteron/RHEL WS cluster.
>>>>>>>
>>>>>>> Thanks for your help,
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Leandro Tavares Carneiro
>>>>>>> Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
>>>>>>> Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
>>>>>>> Tel: (0xx21) 2534-1427
>>>>>>>
>>>>>>>
>>>>>>> Chris Samuel wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, 7 Dec 2004 10:18 pm, Leandro Tavares Carneiro wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>      I have checked everything in my nodes and server and is
>>>>>>>>> everything
>>>>>>>>> OK. All the nodes can recognize the user id i'm using and the home
>>>>>>>>> directory is mounting, but i still got this error.
>>>>>>>>>
>>>>>>>>> Dec  7 09:04:32 node002 pbs_mom: scan_for_exiting, cannot chdir to
>>>>>>>>> user
>>>>>>>>> home directory
>>>>>>>>
>>>>>>>>
>>>>>>>> Are you exporting the users home directories with no_root_squash
>>>>>>>> from the NFS server ?
>>>>>>>>
>>>>>>>> Easiest way to check that is to login to node002 as root and then
>>>>>>>> try and cd to the users home directory - if you get a permission
>>>>>>>> denied error this is probably what's going on.
>>>>>>>>
>>>>>>>> A number of folks have reported this recently, it doesn't affect us
>>>>>>>> here as we're exporting with no_root_squash (we have total control
>>>>>>>> over all clients and server).
>>>>>>>>
>>>>>>>> The other time we've seen this is after an NFS server crash when
>>>>>>>> the clients have stale NFS file handles, again trying the above
>>>>>>>> should tell you.
>>>>>>>>
>>>>>>>> It would be very nice if the pbs_mom reported the value of errno
>>>>>>>> and its sys_errlist equivalent. :-)
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> torqueusers mailing list
>>>>>>>> torqueusers at supercluster.org
>>>>>>>> http://supercluster.org/mailman/listinfo/torqueusers
>>>>>>> _______________________________________________
>>>>>>> torqueusers mailing list
>>>>>>> torqueusers at supercluster.org
>>>>>>> http://supercluster.org/mailman/listinfo/torqueusers
>>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://supercluster.org/mailman/listinfo/torqueusers
>>>
>>
>> Best regards,
>>  Valery Mitsyn
>>
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers
> 


More information about the torqueusers mailing list