[Mauiusers] Premature end of message - ?

Jerry Smith jdsmit at sandia.gov
Tue Nov 22 13:23:40 MST 2005


Marc, Qadir, All

Please correct me if I am wrong, but the nodes file is for job execution hosts only.  We encountered this problem, and were able to make it work by making sure that the internal interface (pbs_server_name )  that talks to the execution hosts is first in the lists of aliases for that IP in the /etc/hosts file.  As well, we have all nodes in /etc/hosts.equiv.

here is my line in /etc/hosts for the pbs_server's private network IP:

172.30.80.252   sadmin2 pbs_server 

[root at sadmin2 root]# cat /var/spool/pbs/server_name 
sadmin2


Jerry


Message: 7
Date: Tue, 22 Nov 2005 11:27:00 -0700
From: Marc Langlois <marc at keyseismic.com>
Subject: Re: [Mauiusers] Premature end of message - ?
To: Qadir Timerghazin <qadir.timerghazin at gmail.com>
Cc: mauiusers at supercluster.org
Message-ID: <1132684020.325.19.camel at dev01>
Content-Type: text/plain

Qadir,

I had a similar problem with 2 network interfaces on the Torque server
node. I added $clienthost entries for hostnames of both interfaces to
./mom_priv/config, but it didn't help. 

After increasing the debug levels in the MOMs and the server, it
appeared (IIRC) to be caused by the trusted host list being set by the
Torque server (and not by MOM). Jobs started running properly after I
added the hostnames for both interfaces to the ./server_priv/nodes file.

Hope this helps,
Marc.
 
On Mon, 2005-11-21 at 12:49, Qadir Timerghazin wrote:

>> Hello all,
>> 
>> I am trying to setup Torque/Maui at a small Linux cluster, with the
>> nodes inside the private network and the headnode with two network
>> interfaces. While the test installation using one of the cluster nodes as
>> a queue master and two other nodes as execution hosts worked perfectly,
>> I can not make it work at the headnode: all the jobs get stuck and
>> checkjob gives the following:
>> 
>> --------------------------------
>> Holds:    Defer
>> Messages:  cannot start job - RM failure, rc: 15031, msg: 'Premature end
>> of message'
>> PE:  1.00  StartPriority:  1
>> cannot select job 0 for partition DEFAULT (job hold active)
>> --------------------------------
>> 
>> I tried changing the server hostname in Torque and Maui configuration
>> files from internal to external and back without much success. Do you
>> think the 'Premature end of message' error has to do with the hostname
>> confusion or I should look somewhere else? 
>> 
>> Thank you.
>> 
>> Qadir
>  
>
-- Marc Langlois Key Seismic Solutions Ltd., Calgary, AB, Canada. marc 
at keyseismic.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20051122/c560d3d6/attachment.html


More information about the mauiusers mailing list