[torqueusers] Specific Users Access

Dimitrakakis Georgios giwrgis at chemistry.uoc.gr
Sat Mar 8 13:11:16 MST 2014


>
> On Mar 8, 2014, at 2:53 PM, Dimitrakakis Georgios wrote:
>
>>
>>> On 03/07/2014 07:35 PM, Dimitrakakis Georgios wrote:
>>>>
>>>>> On 03/07/2014 03:48 PM, Dimitrakakis Georgios wrote:
>>>>>>
>>>>>>> On Friday, 07 March 2014, at 17:56:34 (+0200),
>>>>>>> Dimitrakakis Georgios wrote:
>>>>>>>
>>>>>>>> Thx for the feedback! Apparently the problem has nothing to do
>>>>>>>> with
>>>>>>>> Torque.
>>>>>>>>
>>>>>>>> For some reason on the first node the
>>>>>>>>
>>>>>>>> /usr/soft/application
>>>>>>>>
>>>>>>>> belongs to root:group
>>>>>>>>
>>>>>>>> but on the rest it becomes root:nobody
>>>>>>>>
>>>>>>>> For some reason other nodes cannot get the proper permissions...
>>>>>>>
>>>>>>> Is the group listed in /etc/group on the nodes with the same GID?
>>>>>>>
>>>>>>> Also, is it mounted via NFSv4 or NFSv3 (or something else)?  Try
>>>>>>> NFSv3
>>>>>>> if it's not already mounted with that.
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>> --
>>>>>>> Michael Jennings<mej at lbl.gov>
>>>>>>> Senior HPC Systems Engineer
>>>>>>> High-Performance Computing Services
>>>>>>> Lawrence Berkeley National Laboratory
>>>>>>> Bldg 50B-3209E        W: 510-495-2687
>>>>>>> MS 050B-3209          F: 510-486-8615
>>>>>>> _______________________________________________
>>>>>>> torqueusers mailing list
>>>>>>> torqueusers at supercluster.org
>>>>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>>>>
>>>>>>> --
>>>>>>
>>>>>> Yes the group is listed with the same GID on all nodes!
>>>>>>
>>>>>> The odd thing is that only 2nodes of the entire cluster are having
>>>>>> this
>>>>>> problem. My guess is that is NFS related but all nodes have the same
>>>>>> configuration... I 've putted the odd nodes offline to examine the
>>>>>> problem
>>>>>> further.
>>>>>>
>>>>>> G.
>>>>>>
>>>>>>
>>>>> Hi Georgios
>>>>>
>>>>> If you're using NFSv4, check if these two nodes have rpcidmapd
>>>>> running.
>>>>> [NFS v4 authentication is a pain.]
>>>>>
>>>>> Your NFS server /var/log/messages may show some clue about why it
>>>>> switches from the group you want to "nobody", when the directories
>>>>> are accessed from those two nodes.
>>>>>
>>>>> I hope this helps,
>>>>> Gus Correa
>>>>> _______________________________________________
>>>>> torqueusers mailing list
>>>>> torqueusers at supercluster.org
>>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>>
>>>>> --
>>>>
>>>> Hi Gus,
>>>>
>>>> rpc.idmapd was running and I couldn't find anything useful in the
>>>> logs..
>>>>
>>>> Since one node wasn't occupied at all I gave it a shot by rebooting it
>>>> and
>>>> everything appears back to normal! I 'll do that with the second node
>>>> and
>>>> hopefully it will be resolved there as well.
>>>>
>>>> Regards,
>>>>
>>>> G.
>>>>
>>>>
>>>
>>> Hi Georgios
>>>
>>> Did your server syslog show "Stale client ..." messages?
>>> We have got these sometimes, some seem to have "healed" alone,
>>> but in some cases only the client node reboot would bring
>>> things back to normal.
>>> NFSv4 has been more pain than joy.
>>>
>>> You could increase the rpcidmapd verbosity level in /etc/idmapd.conf.
>>> I think the default is 0, I increased to 5.
>>> When things are working it fills the log with useless messages:
>>> Server : (user) id ...
>>> nfs4_uid_to_name ...
>>> ...
>>> Server : (group) id ...
>>> nfs4_gid_to_name...
>>> ...
>>> But at least it shows when user/group authentication doesn't work.
>>>
>>> I hope this helps,
>>> Gus Correa
>>> --
>> Nothing like these in the log files!
>>
>> I 've increased the verbosity and will keep monitoring but I think that
>> I
>> have to reboot the "infected" node for the problem to be resolved.
>>
>> Thanks for the feedback Gus!
>>
>> Cheers,
>>
>> G.
>
>
> Hi Georgios
>
> Well, this won't fix anything, but may help your diagnostic.
>
> You may need to restart the NFS server, or at least restart the rpcidmapd
> daemon there
> for the higher verbosity level to take effect, but I don't know if this
> will disturb ongoing NFS operations.
>
> Yes, most likely you will need to reboot the problematic node also.
> It may not connect to the NFS properly now, anyway.
>
> Gus Correa
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> --

Hi again Gus!

Yes! I know it won't fix anything but maybe will give me more info on why
this is happening. I have restarted all the relevant services and will
reboot the node on first chance.

Regards,

G.


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the torqueusers mailing list