[torqueusers] No contact with server at hostaddr problem
Carbo, Timothy J.
TIMOTHY.J.CARBO at saic.com
Tue Jul 10 15:10:19 MDT 2007
Yes that helps a bunch! I verified that pbs_server was in fact
connected to port 15000 on cree. I restarted using pbs_mom -S 15000 on
huron and everything works fine now.
Thanks again. Now I can do some real work :-)
From: nathaniel.x.woody at gsk.com [mailto:nathaniel.x.woody at gsk.com]
Sent: Tuesday, July 10, 2007 2:20 PM
To: Carbo, Timothy J.
Cc: torqueusers at supercluster.org
Subject: RE: [torqueusers] No contact with server at hostaddr problem
I have managed to mis-configure pbs to give me these symptoms in two
1) pbs_server isn't running on the port that the mom thinks it is. Make
sure that the pbs_server is running on 15001 (looks like you're already
looking at this). As mentioned not long ago, you can start the mom with
pbs_mom -S 15000 to force mom to look for the server at port 15000.
(though I suppose a wise thing to do may be to see what port pbs_server
is running on first).
2) Fudged up ethernet names for the server on the mom (personally, I've
done this with multi-homed servers). Does the mom-node (cree?) know who
huron(server) is (and vice versa)? Is that an entry in /etc/hosts on
the mom-node? Being a wimp, I almost always use the ip of the server in
the config file for the pbsserver entry to avoid making this mistake.
I suppose the third option is that pbs_server is actually running at all
Hope that helps,
"Carbo, Timothy J." <TIMOTHY.J.CARBO at saic.com>
Sent by: torqueusers-bounces at supercluster.org
"Garrick Staples" <garrick at usc.edu>, torqueusers at supercluster.org
RE: [torqueusers] No contact with server at hostaddr problem (followup)
Sorry I wasn't clear
My set up is
Node1 (cree): running pbs_server, pbs_mom and maui
Node2 (huron): running pbs_mom only
When I submit the following on cree
echo "sleep 30" | qsub
the job appears to be scheduled on huron and runs OK but then I start
seeing the "No contact with server at hostaddr port 15001" error
messages repeated in the mom_logs file on huron and it appears that the
pbs_server never is notified that the job ran to completion.
Hope this clears things up a little.
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Garrick
Sent: Tuesday, July 10, 2007 12:28 PM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] No contact with server at hostaddr problem
On Mon, Jul 09, 2007 at 09:30:09AM -0600, Carbo, Timothy J. alleged:
> Hello all.
> I was tracking the following email chain and was wondering if there is
> any resolution to the problem below. I just installed TORQUE 2.1.8
> Maui 3.2.6-p19 on a two node system (both x86-64 bit Xeon quad core
> systems running Red Hat AS 4 update 4) and am having the same exact
> problem when I try to submit a job on my client node (jobs run fine on
> the server node). Oddly, the remote node is trying to connect to port
> 15001 on the server node but netstat -a indicates there is nothing
> listening at that port. I am pretty new to Torque so am I missing
It is a little hard to figure out your setup here with "client",
"server", and "remote" nodes.
If both hosts are to handle compute jobs, then you want pbs_mom running
on both hosts and both hostnames in server_priv/nodes.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
torqueusers mailing list
torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers