[torqueusers] connect to specific nodes within a cluster

Jerry Smith jdsmit at sandia.gov
Tue Apr 28 09:12:59 MDT 2009


> Thanks for your answer, Jerry. It seems the problem is related to your
> comment "Are these nodes allocated to a job of yours?"
>
> In effect, I've realized that if I've got a job allocated to a node, I can
> simply access to it through
>
>   
>> ssh nodename
>>     
>
> Thus, more specifically, the problem would be to access to a node without
> allocated active jobs of my own. The problem is that if a job is
> interrupted with qdel, I've realized that some remaining related processes
> may stay working in the slave nodes. Thus it would useful to me as a user
> to also access these nodes to manually kill these active processes.
>   
The administrators of this cluster need to have some process cleanup 
happen in the epilogue to make sure that "leftover" user processes are 
purged  at job end.

Allowing a user access to a node not running a job of theirs is a 
security risk IMHO, as you could ssh to a node running someone else's 
job, and possibly access data not belonging to them.  Or on a 
non-nefarious note, you as a user not assigned to the node, could 
"accidentally" kill the wrong process, and affect the other user's job 
negatively.

--Jerry

> Actually my ``remaining'' processes have just finished, but this would
> still be useful for a near future.
>
> Javier
> ...
>
>
>   
>>     
>>> Hello all,
>>> (I'm new to cluster usage)
>>>
>>> If I log into a torque cluster, e.g.:
>>>
>>>
>>>       
>>>> ssh -Y myusername at cluster.domain.org
>>>>
>>>>         
>>> and this cluster has nodes with the names:
>>>
>>> cluster01.domain.org
>>> cluster02.domain.org
>>> ...
>>> clusternn.domain.org
>>>
>>> How could I, after I have logged in the cluster, to connect to a
>>> specific
>>> node to see the active processes in it? I would like to monitor specific
>>> processes (and probably kill them) within specific nodes. I've tried
>>> several options:
>>>
>>> [myusername at master00 ~]ssh -Y cluster01
>>> [myusername at master00 ~]ssh -Y myusername at cluster01
>>> [myusername at master00 ~]ssh -Y myusername at cluster01.domain.org
>>>
>>> without success. In all of them my password is required and my login
>>> password is not accepted.
>>>
>>>       
>> It all depends on how the security and access is setup for that specific
>> cluster.  What is the access model, ssh,rsh etc,  Is it pam based, or
>> /etc/security/access.conf?  Do they use ssh-keys?  Are these nodes
>> allocated to a job of yours?
>>
>>
>> --Jerry
>>
>>     
>
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090428/3728fe44/attachment-0001.html 


More information about the torqueusers mailing list