[torqueusers] Queue Problem

Michel Béland michel.beland at calculquebec.ca
Thu Sep 12 07:37:54 MDT 2013


Jurgens de Bruin wrote :
> Hi
>
> This is driving my crazy...
>
> I gave 3 queues a default batch and two additional "specialized". If a 
> submit  a job to any 2 of the queues the job executes  without any 
> problems, but one of the "specialized" queues does not seem to work 
> this is the queue setup:
>
> # Create and define queue himem
> #
> create queue himem
> set queue himem queue_type = Execution
> set queue himem resources_default.neednodes = bigmem
>
> So queue clc and batch work perfectly, himem produces the following error:
>
> *** error from copy
> Host key verification failed.
> lost connection
> *** end error output
> Output retained on that host in: 
> /var/spool/torque/undelivered/49.manager.OU
>
> Any idea/ suggestion would be appreciated

Find out which node ran job 49. Then try to ssh from this node to the 
server. To debug this I guess that you should make sure with "ssh-key 
-l" that you do not use your own public/private key pair and remove it 
if you have one, for the duration of the test. You should test this with 
ssh option -a to disable forwarding of the authentication agent 
connection. Also add -v to see what ssh tries to do.

Hope this helps,

-- 
Michel Béland, analyste en calcul scientifique
michel.beland at calculquebec.ca
bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal
téléphone : 514 343-6111 poste 3892     télécopieur : 514 343-2155
Calcul Québec (www.calculquebec.ca)
Calcul Canada (calculcanada.ca)



More information about the torqueusers mailing list