[torqueusers] error on qsub/mpirun jobs

Zhiliang Hu zhu at iastate.edu
Mon Sep 8 09:28:26 MDT 2008


I have a a mpiblast job that runs well on command line with "mpirun",
but encounter errors when "qsub" to run:

qsub -l nodes=6:ppn=2
     -e /path/to/locationA
     -o /path/to/locationA
     /path/to/program

----------------------------------------------------------
Unable to copy file /var/spool/torque/spool/658.nagrp2..ER to
hu at hist:/raid/pub/ncbi/blast/www/mpiblast.tmp
>>> error from copy
Host key verification failed.
lost connection
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/658.nagrp2..ER
----------------------------------------------------------

-- When manually check, the "retained" file "/var/spool/torque/undelivered/658.nagrp2..ER" is not there.

-- I wonder why "Host key verification failed"?  Since I can ssh to all nodes, and run it with mpirun with no problem.  I suspect there might be something in torgue that may lead to above, possible misleading, errors?

Any hint to look further is appreciated.

Zhiliang



More information about the torqueusers mailing list