[torqueusers] Specific Users Access

Gustavo Correa gus at ldeo.columbia.edu
Sat Mar 8 13:06:08 MST 2014


On Mar 8, 2014, at 2:53 PM, Dimitrakakis Georgios wrote:

> 
>> On 03/07/2014 07:35 PM, Dimitrakakis Georgios wrote:
>>> 
>>>> On 03/07/2014 03:48 PM, Dimitrakakis Georgios wrote:
>>>>> 
>>>>>> On Friday, 07 March 2014, at 17:56:34 (+0200),
>>>>>> Dimitrakakis Georgios wrote:
>>>>>> 
>>>>>>> Thx for the feedback! Apparently the problem has nothing to do with
>>>>>>> Torque.
>>>>>>> 
>>>>>>> For some reason on the first node the
>>>>>>> 
>>>>>>> /usr/soft/application
>>>>>>> 
>>>>>>> belongs to root:group
>>>>>>> 
>>>>>>> but on the rest it becomes root:nobody
>>>>>>> 
>>>>>>> For some reason other nodes cannot get the proper permissions...
>>>>>> 
>>>>>> Is the group listed in /etc/group on the nodes with the same GID?
>>>>>> 
>>>>>> Also, is it mounted via NFSv4 or NFSv3 (or something else)?  Try
>>>>>> NFSv3
>>>>>> if it's not already mounted with that.
>>>>>> 
>>>>>> Michael
>>>>>> 
>>>>>> --
>>>>>> Michael Jennings<mej at lbl.gov>
>>>>>> Senior HPC Systems Engineer
>>>>>> High-Performance Computing Services
>>>>>> Lawrence Berkeley National Laboratory
>>>>>> Bldg 50B-3209E        W: 510-495-2687
>>>>>> MS 050B-3209          F: 510-486-8615
>>>>>> _______________________________________________
>>>>>> torqueusers mailing list
>>>>>> torqueusers at supercluster.org
>>>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>>> 
>>>>>> --
>>>>> 
>>>>> Yes the group is listed with the same GID on all nodes!
>>>>> 
>>>>> The odd thing is that only 2nodes of the entire cluster are having
>>>>> this
>>>>> problem. My guess is that is NFS related but all nodes have the same
>>>>> configuration... I 've putted the odd nodes offline to examine the
>>>>> problem
>>>>> further.
>>>>> 
>>>>> G.
>>>>> 
>>>>> 
>>>> Hi Georgios
>>>> 
>>>> If you're using NFSv4, check if these two nodes have rpcidmapd running.
>>>> [NFS v4 authentication is a pain.]
>>>> 
>>>> Your NFS server /var/log/messages may show some clue about why it
>>>> switches from the group you want to "nobody", when the directories
>>>> are accessed from those two nodes.
>>>> 
>>>> I hope this helps,
>>>> Gus Correa
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>> 
>>>> --
>>> 
>>> Hi Gus,
>>> 
>>> rpc.idmapd was running and I couldn't find anything useful in the logs..
>>> 
>>> Since one node wasn't occupied at all I gave it a shot by rebooting it
>>> and
>>> everything appears back to normal! I 'll do that with the second node
>>> and
>>> hopefully it will be resolved there as well.
>>> 
>>> Regards,
>>> 
>>> G.
>>> 
>>> 
>> 
>> Hi Georgios
>> 
>> Did your server syslog show "Stale client ..." messages?
>> We have got these sometimes, some seem to have "healed" alone,
>> but in some cases only the client node reboot would bring
>> things back to normal.
>> NFSv4 has been more pain than joy.
>> 
>> You could increase the rpcidmapd verbosity level in /etc/idmapd.conf.
>> I think the default is 0, I increased to 5.
>> When things are working it fills the log with useless messages:
>> Server : (user) id ...
>> nfs4_uid_to_name ...
>> ...
>> Server : (group) id ...
>> nfs4_gid_to_name...
>> ...
>> But at least it shows when user/group authentication doesn't work.
>> 
>> I hope this helps,
>> Gus Correa
>> --
> Nothing like these in the log files!
> 
> I 've increased the verbosity and will keep monitoring but I think that I
> have to reboot the "infected" node for the problem to be resolved.
> 
> Thanks for the feedback Gus!
> 
> Cheers,
> 
> G.


Hi Georgios

Well, this won't fix anything, but may help your diagnostic.

You may need to restart the NFS server, or at least restart the rpcidmapd daemon there
for the higher verbosity level to take effect, but I don't know if this will disturb ongoing NFS operations.

Yes, most likely you will need to reboot the problematic node also.
It may not connect to the NFS properly now, anyway.

Gus Correa


More information about the torqueusers mailing list