[torqueusers] No contact with server at hostaddr problem (followup)

Garrick Staples garrick at usc.edu
Tue Jul 10 14:19:43 MDT 2007


On Tue, Jul 10, 2007 at 01:16:30PM -0600, Carbo, Timothy J. alleged:
> Garrick:
> 
> Sorry I wasn't clear
> 
> My set up is 
> 
> Node1 (cree):  running pbs_server, pbs_mom and maui
> 
> server_priv/nodes:
> cree np=8
> Huron np=8
> 
> mom_priv/config:
> $pbsserver cree
> 
> Node2 (huron):  running pbs_mom only
> 
> mom_priv/config:
> $pbsserver cree
> 
> When I submit the following on cree
> 
> echo "sleep 30" | qsub
> 
> the job appears to be scheduled on huron and runs OK but then I start
> seeing the "No contact with server at hostaddr port 15001" error
> messages repeated in the mom_logs file on huron and it appears that the
> pbs_server never is notified that the job ran to completion.
> 
> Hope this clears things up a little.

Check name services, cree and huron need to be able to resolve each
other's names to IPs that they can each (and matching forward and
reverse resolution).

And check port filtering.  If these are Linux, 'iptables-save' is a good
way to list any filtering rules.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070710/fd18b621/attachment.bin


More information about the torqueusers mailing list