[torqueusers] connect to specific nodes within a cluster
Jerry Smith
jdsmit at sandia.gov
Tue Apr 28 09:12:59 MDT 2009
> Thanks for your answer, Jerry. It seems the problem is related to your
> comment "Are these nodes allocated to a job of yours?"
>
> In effect, I've realized that if I've got a job allocated to a node, I can
> simply access to it through
>
>
>> ssh nodename
>>
>
> Thus, more specifically, the problem would be to access to a node without
> allocated active jobs of my own. The problem is that if a job is
> interrupted with qdel, I've realized that some remaining related processes
> may stay working in the slave nodes. Thus it would useful to me as a user
> to also access these nodes to manually kill these active processes.
>
The administrators of this cluster need to have some process cleanup
happen in the epilogue to make sure that "leftover" user processes are
purged at job end.
Allowing a user access to a node not running a job of theirs is a
security risk IMHO, as you could ssh to a node running someone else's
job, and possibly access data not belonging to them. Or on a
non-nefarious note, you as a user not assigned to the node, could
"accidentally" kill the wrong process, and affect the other user's job
negatively.
--Jerry
> Actually my ``remaining'' processes have just finished, but this would
> still be useful for a near future.
>
> Javier
> ...
>
>
>
>>
>>> Hello all,
>>> (I'm new to cluster usage)
>>>
>>> If I log into a torque cluster, e.g.:
>>>
>>>
>>>
>>>> ssh -Y myusername at cluster.domain.org
>>>>
>>>>
>>> and this cluster has nodes with the names:
>>>
>>> cluster01.domain.org
>>> cluster02.domain.org
>>> ...
>>> clusternn.domain.org
>>>
>>> How could I, after I have logged in the cluster, to connect to a
>>> specific
>>> node to see the active processes in it? I would like to monitor specific
>>> processes (and probably kill them) within specific nodes. I've tried
>>> several options:
>>>
>>> [myusername at master00 ~]ssh -Y cluster01
>>> [myusername at master00 ~]ssh -Y myusername at cluster01
>>> [myusername at master00 ~]ssh -Y myusername at cluster01.domain.org
>>>
>>> without success. In all of them my password is required and my login
>>> password is not accepted.
>>>
>>>
>> It all depends on how the security and access is setup for that specific
>> cluster. What is the access model, ssh,rsh etc, Is it pam based, or
>> /etc/security/access.conf? Do they use ssh-keys? Are these nodes
>> allocated to a job of yours?
>>
>>
>> --Jerry
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090428/3728fe44/attachment-0001.html
More information about the torqueusers
mailing list