[torqueusers] Specific Users Access

Dimitrakakis Georgios giwrgis at chemistry.uoc.gr
Sat Mar 8 12:53:48 MST 2014


> On 03/07/2014 07:35 PM, Dimitrakakis Georgios wrote:
>>
>>> On 03/07/2014 03:48 PM, Dimitrakakis Georgios wrote:
>>>>
>>>>> On Friday, 07 March 2014, at 17:56:34 (+0200),
>>>>> Dimitrakakis Georgios wrote:
>>>>>
>>>>>> Thx for the feedback! Apparently the problem has nothing to do with
>>>>>> Torque.
>>>>>>
>>>>>> For some reason on the first node the
>>>>>>
>>>>>> /usr/soft/application
>>>>>>
>>>>>> belongs to root:group
>>>>>>
>>>>>> but on the rest it becomes root:nobody
>>>>>>
>>>>>> For some reason other nodes cannot get the proper permissions...
>>>>>
>>>>> Is the group listed in /etc/group on the nodes with the same GID?
>>>>>
>>>>> Also, is it mounted via NFSv4 or NFSv3 (or something else)?  Try
>>>>> NFSv3
>>>>> if it's not already mounted with that.
>>>>>
>>>>> Michael
>>>>>
>>>>> --
>>>>> Michael Jennings<mej at lbl.gov>
>>>>> Senior HPC Systems Engineer
>>>>> High-Performance Computing Services
>>>>> Lawrence Berkeley National Laboratory
>>>>> Bldg 50B-3209E        W: 510-495-2687
>>>>> MS 050B-3209          F: 510-486-8615
>>>>> _______________________________________________
>>>>> torqueusers mailing list
>>>>> torqueusers at supercluster.org
>>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>>
>>>>> --
>>>>
>>>> Yes the group is listed with the same GID on all nodes!
>>>>
>>>> The odd thing is that only 2nodes of the entire cluster are having
>>>> this
>>>> problem. My guess is that is NFS related but all nodes have the same
>>>> configuration... I 've putted the odd nodes offline to examine the
>>>> problem
>>>> further.
>>>>
>>>> G.
>>>>
>>>>
>>> Hi Georgios
>>>
>>> If you're using NFSv4, check if these two nodes have rpcidmapd running.
>>> [NFS v4 authentication is a pain.]
>>>
>>> Your NFS server /var/log/messages may show some clue about why it
>>> switches from the group you want to "nobody", when the directories
>>> are accessed from those two nodes.
>>>
>>> I hope this helps,
>>> Gus Correa
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>>> --
>>
>> Hi Gus,
>>
>> rpc.idmapd was running and I couldn't find anything useful in the logs..
>>
>> Since one node wasn't occupied at all I gave it a shot by rebooting it
>> and
>> everything appears back to normal! I 'll do that with the second node
>> and
>> hopefully it will be resolved there as well.
>>
>> Regards,
>>
>> G.
>>
>>
>
> Hi Georgios
>
> Did your server syslog show "Stale client ..." messages?
> We have got these sometimes, some seem to have "healed" alone,
> but in some cases only the client node reboot would bring
> things back to normal.
> NFSv4 has been more pain than joy.
>
> You could increase the rpcidmapd verbosity level in /etc/idmapd.conf.
> I think the default is 0, I increased to 5.
> When things are working it fills the log with useless messages:
> Server : (user) id ...
> nfs4_uid_to_name ...
> ...
> Server : (group) id ...
> nfs4_gid_to_name...
> ...
> But at least it shows when user/group authentication doesn't work.
>
> I hope this helps,
> Gus Correa
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> --
Nothing like these in the log files!

I 've increased the verbosity and will keep monitoring but I think that I
have to reboot the "infected" node for the problem to be resolved.

Thanks for the feedback Gus!

Cheers,

G.


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the torqueusers mailing list