[torqueusers] pbs_server error 15008

Lyon, Robert rlyon at uidaho.edu
Fri May 2 14:59:58 MDT 2008


>George Zikos wrote:
>> Adrian Sevcenco wrote:
>>> George Zikos wrote:
>>>> Hi,
>>>> it looks like a firewall problem to me.  You can try and make sure 
>>>> that the server all the nodes have the correct /etc/hosts and that 
>>>> your firewall allows pbs_mom to communicate with the nodes, From
the 
>>>> trouble shooting page of torque: "TORQUE pbs_mom daemons use UDP 
>>>> port 1023 and the pbs_server/pbs_mom daemons use ports 15001-15004
by default".
>>>>
>>>> George
>>> Thanks, it seems that this is the problem! to be sure on worker
nodes 
>>> i stopped the firewall and on server just for test i did the same
and 
>>> everything was all right :( .This means that i have a problem with 
>>> setting up the firewall ... i have this in iptables :
>>> -A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 
>>> 1023 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m

>>> tcp --dport
>>> 15001:15004 -j ACCEPT
>>> -A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport
>>> 15001:15004 -j ACCEPT
>>> Can you or someone spot the mistake? in principle should work
without 
>>> any problem... should i drop the --state?
>>> Thank you,
>>> Best regards,
>>> Adrian
>>>
>> Hi Adrian,
>> 
>> I think you just need to edit /etc/services rather than imposing
rules 
>> to iptables. Just add the following lines:
>> 
>> pbs_mom         1023/udp
>> pbs                   15001/tcp
>> pbs_server       15001/udp
>> pbs_mom         15002/tcp
>> pbs_mom         15002/udp
>> pbs_resmom    15003/tcp
>> pbs_resmom    15003/udp
>> pbs_sched       15004/tcp
>> pbs_sched       15004/udp
>> 
>> and the ports will open for torque. You can skip the last two if you 
>> are using maui and not pbs_sched as scheduler.
>> 
>> 
>> George
>Thanks George, but there are already in there :( and i have the same
problems. Can somebody >help me with an example of iptables for a torque
server ?
>Thank you,
>Best regards,
>Adrian

Hi Adrian,

There are a few options for you to allow your firewall to work for you,
if your compute nodes are in a private network, only accessible from the
head node, and the firewall is only for the incoming internet
connections to the head node, then it is probably safe to allow all
traffic from device connected to the private network by adding that to
your rules ( -A INPUT -i $LOCAL_DEVICE -j ACCEPT ) and getting rid of
the more specific rules.

If you really need the firewall, check to see if you are allowing for
established connections, in general once a new connection is made it is
then in an established state.  It looks as if you may be dropping
anything but the initial connection.  Try adding a line to your
configuration somewhere near the top that allows all established/related
connections ( -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT )
or adding the state to the rule for the actual port.

I also noticed that you have rules are defined for the udp protocol but
torque uses the tcp protocol.  In addition, check to see if you're
allowing the responses in the OUTPUT chain as well.

-
Rob


More information about the torqueusers mailing list