[torqueusers] Question about the difference between a node where pbs_server is run and a compute node

Bas van der Vlies basv at sara.nl
Thu Apr 29 01:26:06 MDT 2010

On 28 apr 2010, at 20:37, Garrick Staples wrote:

> On Wed, Apr 28, 2010 at 08:05:08PM +0200, Bas van der Vlies alleged:
>> Just a question is there switch in the configure to switch back to the old pbs_iff behaviour?
> What old pbs_iff behaviour? The unix domain socket code has been there since the 2.1.x days.

Garrick can you explain why our 2.1.11 pbs utilities use the 'pbs_iff' interface to communicate with the pbs_server if they run on the node where the pbs_server is started?  We do not have any problems because a child is created and pbs_server can accept connections again. So in this installation
the /tmp/.torque-unix is not used at all or has it a different name? 

When we run the same utitlies on a 2.4.7 installation the /tmp/.torque-unix is used and no child created.  The problem might be that the server  only handles one connection when /tmp/.torque-unix is used. So when i do i pbs_connect() an let it linger it will eventually timeout, but the pbs_server does not accept connections anymore till the timeout. 

That is why i asked if we can use the pbs_iff interface on the pbs_server again!!!  

To trigger is it easy. Just use pbs_connect() and do not close it. We have tested it on:
  - debian lenny
  - centos 5

If Found the problem on the pbs_server:
  - /var/spool/torque/server_name

If this contains a name that is in /etc/hosts it uses the /tmp/.torque-unix mechanism that causes the problem. If is defined a name that must be 'resolved' other then /etc/hosts it will use the pbs_iff interface,  this has no problem because a child process is created. 

So the temporary solution is to use a name that must be resolved by DNS.  

The question is can the unix domain socket handle more the one connection?

Just my 2 cents

Bas van der Vlies
basv at sara.nl

More information about the torqueusers mailing list