[torqueusers] Remote submit host: qsub -I fails
Bill Wichser
bill at princeton.edu
Sat Jun 20 19:36:34 MDT 2009
I have a host set up to run the torque server as well as maui scheduler
(server). I also have a login node set up to send jobs to this torque
server (login1). My version of torque is 2.3.6.
While I can submit jobs fine from this login host, use qstat and showq,
I cannot submit an interactive job. Here is the output:
% qsub -I -l nodes=2:ppn=1,walltime=10:00
qsub: waiting for job 39.server to start
qsub: job 39.server apparently deleted
While I cannot run a tracejob on this login node, a tracejob on the
server shows:
Job: 39.server
06/20/2009 20:53:29 S enqueuing into default, state 1 hop 1
06/20/2009 20:53:29 S dequeuing from default, state QUEUED
06/20/2009 20:53:29 S enqueuing into short, state 1 hop 1
06/20/2009 20:53:29 S Job Queued at request of bill at login1, owner =
bill at login1, job name = STDIN, queue = short
06/20/2009 20:53:29 A queue=default
06/20/2009 20:53:29 A queue=short
06/20/2009 20:53:30 S Job Modified at request of root at server
06/20/2009 20:53:30 S Job Run at request of root at server
06/20/2009 20:53:30 S Job Modified at request of root at server
06/20/2009 20:53:30 S Exit_status=-1 resources_used.cput=00:00:00
resources_used.mem=0kb resources_used.vmem=0kb
resources_used.walltime=00:00:00
Error_Path=/dev/pts/0
Output_Path=/dev/pts/0
Note that Exit_status=-1 which in one discussion on this list referred
to an /etc/resolv.conf issue.
Checking /var/log/messages on a node, I find pbs_mom spitting out info
about my multihomed host:
Jun 20 21:27:55 r6c2n1 pbs_mom: No route to host (113) in
TMomFinalizeChild, cannot open interactive qsub socket to host
login1:52427 - 'cannot bind to port 1023 in client_to_svr - connection
refused' - check routing tables/multi-homed host issues
Both my server and login nodes are multi-homed. Everyone has local
addresses in /etc/hosts. I've added to /var/spool/PBS/torque.cfg a line:
SERVERHOST server
on my server, believing that a string is needed here rather than an
actual IP. Regardless, the interactive session is trying to get back to
a remote submit host which is also multihomed.
Before I tread down the path of assigning a different hostname for the
local network (login1-sn23 say), does anyone have any experience with
this type of setup? Am I onto the right path here?
Thanks,
Bill
More information about the torqueusers
mailing list