[torqueusers] Queue Problem

Jurgens de Bruin debruinjj at gmail.com
Fri Sep 13 01:45:49 MDT 2013


Problem solved


On 12 September 2013 15:37, Michel Béland <michel.beland at calculquebec.ca>wrote:

> Jurgens de Bruin wrote :
> > Hi
> >
> > This is driving my crazy...
> >
> > I gave 3 queues a default batch and two additional "specialized". If a
> > submit  a job to any 2 of the queues the job executes  without any
> > problems, but one of the "specialized" queues does not seem to work
> > this is the queue setup:
> >
> > # Create and define queue himem
> > #
> > create queue himem
> > set queue himem queue_type = Execution
> > set queue himem resources_default.neednodes = bigmem
> >
> > So queue clc and batch work perfectly, himem produces the following
> error:
> >
> > *** error from copy
> > Host key verification failed.
> > lost connection
> > *** end error output
> > Output retained on that host in:
> > /var/spool/torque/undelivered/49.manager.OU
> >
> > Any idea/ suggestion would be appreciated
>
> Find out which node ran job 49. Then try to ssh from this node to the
> server. To debug this I guess that you should make sure with "ssh-key
> -l" that you do not use your own public/private key pair and remove it
> if you have one, for the duration of the test. You should test this with
> ssh option -a to disable forwarding of the authentication agent
> connection. Also add -v to see what ssh tries to do.
>
> Hope this helps,
>
> --
> Michel Béland, analyste en calcul scientifique
> michel.beland at calculquebec.ca
> bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal
> téléphone : 514 343-6111 poste 3892     télécopieur : 514 343-2155
> Calcul Québec (www.calculquebec.ca)
> Calcul Canada (calculcanada.ca)
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130913/871dc4b4/attachment.html 


More information about the torqueusers mailing list