[torqueusers] Specific Users Access

Gus Correa gus at ldeo.columbia.edu
Fri Mar 7 18:04:07 MST 2014


On 03/07/2014 07:35 PM, Dimitrakakis Georgios wrote:
>
>> On 03/07/2014 03:48 PM, Dimitrakakis Georgios wrote:
>>>
>>>> On Friday, 07 March 2014, at 17:56:34 (+0200),
>>>> Dimitrakakis Georgios wrote:
>>>>
>>>>> Thx for the feedback! Apparently the problem has nothing to do with
>>>>> Torque.
>>>>>
>>>>> For some reason on the first node the
>>>>>
>>>>> /usr/soft/application
>>>>>
>>>>> belongs to root:group
>>>>>
>>>>> but on the rest it becomes root:nobody
>>>>>
>>>>> For some reason other nodes cannot get the proper permissions...
>>>>
>>>> Is the group listed in /etc/group on the nodes with the same GID?
>>>>
>>>> Also, is it mounted via NFSv4 or NFSv3 (or something else)?  Try NFSv3
>>>> if it's not already mounted with that.
>>>>
>>>> Michael
>>>>
>>>> --
>>>> Michael Jennings<mej at lbl.gov>
>>>> Senior HPC Systems Engineer
>>>> High-Performance Computing Services
>>>> Lawrence Berkeley National Laboratory
>>>> Bldg 50B-3209E        W: 510-495-2687
>>>> MS 050B-3209          F: 510-486-8615
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>> --
>>>
>>> Yes the group is listed with the same GID on all nodes!
>>>
>>> The odd thing is that only 2nodes of the entire cluster are having this
>>> problem. My guess is that is NFS related but all nodes have the same
>>> configuration... I 've putted the odd nodes offline to examine the
>>> problem
>>> further.
>>>
>>> G.
>>>
>>>
>> Hi Georgios
>>
>> If you're using NFSv4, check if these two nodes have rpcidmapd running.
>> [NFS v4 authentication is a pain.]
>>
>> Your NFS server /var/log/messages may show some clue about why it
>> switches from the group you want to "nobody", when the directories
>> are accessed from those two nodes.
>>
>> I hope this helps,
>> Gus Correa
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>> --
>
> Hi Gus,
>
> rpc.idmapd was running and I couldn't find anything useful in the logs..
>
> Since one node wasn't occupied at all I gave it a shot by rebooting it and
> everything appears back to normal! I 'll do that with the second node and
> hopefully it will be resolved there as well.
>
> Regards,
>
> G.
>
>

Hi Georgios

Did your server syslog show "Stale client ..." messages?
We have got these sometimes, some seem to have "healed" alone,
but in some cases only the client node reboot would bring
things back to normal.
NFSv4 has been more pain than joy.

You could increase the rpcidmapd verbosity level in /etc/idmapd.conf.
I think the default is 0, I increased to 5.
When things are working it fills the log with useless messages:
Server : (user) id ...
nfs4_uid_to_name ...
...
Server : (group) id ...
nfs4_gid_to_name...
...
But at least it shows when user/group authentication doesn't work.

I hope this helps,
Gus Correa


More information about the torqueusers mailing list