[torqueusers] Remote submit host: qsub -I fails

Prakash Velayutham prakash.velayutham at cchmc.org
Sun Jun 21 05:14:37 MDT 2009


I had the exact same issue 2 months back and the issue turned out to  
be incorrect entry in /etc/hosts for the submit host in the submit host.

Prakash

On Jun 21, 2009, at 12:34 AM, Smith, Jerry Don II wrote:

> Does the internal hostname of your server resolve to the one you  
> assigned in $PBS_HOME/server?
>
> And is it the first assigned alias for that machine in /etc/hosts?
>
> --Jerry
>
>
> ----- Original Message -----
> From: torqueusers-bounces at supercluster.org <torqueusers-bounces at supercluster.org 
> >
> To: torqueusers at supercluster.org <torqueusers at supercluster.org>
> Sent: Sat Jun 20 19:36:34 2009
> Subject: [torqueusers] Remote submit host:  qsub -I fails
>
> I have a host set up to run the torque server as well as maui  
> scheduler
> (server).  I also have a login node set up to send jobs to this torque
> server (login1).  My version of torque is 2.3.6.
>
> While I can submit jobs fine from this login host, use qstat and  
> showq,
> I cannot submit an interactive job.  Here is the output:
>
> % qsub -I -l nodes=2:ppn=1,walltime=10:00
> qsub: waiting for job 39.server to start
> qsub: job 39.server apparently deleted
>
> While I cannot run a tracejob on this login node, a tracejob on the
> server shows:
>
>
> Job: 39.server
>
> 06/20/2009 20:53:29  S    enqueuing into default, state 1 hop 1
> 06/20/2009 20:53:29  S    dequeuing from default, state QUEUED
> 06/20/2009 20:53:29  S    enqueuing into short, state 1 hop 1
> 06/20/2009 20:53:29  S    Job Queued at request of bill at login1,  
> owner =
>                           bill at login1, job name = STDIN, queue = short
> 06/20/2009 20:53:29  A    queue=default
> 06/20/2009 20:53:29  A    queue=short
> 06/20/2009 20:53:30  S    Job Modified at request of root at server
> 06/20/2009 20:53:30  S    Job Run at request of root at server
> 06/20/2009 20:53:30  S    Job Modified at request of root at server
> 06/20/2009 20:53:30  S    Exit_status=-1 resources_used.cput=00:00:00
>                           resources_used.mem=0kb  
> resources_used.vmem=0kb
>                           resources_used.walltime=00:00:00
> Error_Path=/dev/pts/0
>                           Output_Path=/dev/pts/0
>
> Note that Exit_status=-1  which in one discussion on this list  
> referred
> to an /etc/resolv.conf issue.
>
> Checking /var/log/messages on a node, I find pbs_mom spitting out info
> about my multihomed host:
>
> Jun 20 21:27:55 r6c2n1 pbs_mom: No route to host (113) in
> TMomFinalizeChild, cannot open interactive qsub socket to host
> login1:52427 - 'cannot bind to port 1023 in client_to_svr - connection
> refused' - check routing tables/multi-homed host issues
>
> Both my server and login nodes are multi-homed.  Everyone has local
> addresses in /etc/hosts.  I've added to /var/spool/PBS/torque.cfg a  
> line:
> SERVERHOST  server
> on my server, believing that a string is needed here rather than an
> actual IP.  Regardless, the interactive session is trying to get  
> back to
> a remote submit host which is also multihomed.
>
> Before I tread down the path of assigning a different hostname for the
> local network (login1-sn23 say), does anyone have any experience with
> this type of setup?  Am I onto the right path here?
>
> Thanks,
> Bill
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list