[torqueusers] Problem with canonical hostnames in mom_priv/nodes file

Michael Marti michael.marti at ist.utl.pt
Mon Mar 30 19:05:06 MDT 2009

Dear All

We are using torque-2.3.6 on aix (AIX r1blade066 3 5 00003222D100)

On the head-node in /etc/hosts compute nodes have the following entry:     r1blade001 r1blade001m r1blade001q      # Rack 1,  
BladeCenter1, blade 1    r1blade001 r1blade001i   # Rack 1, BladeCenter1, blade 1

If we specify the nodes with the m suffix (as in r1blade001m) in the  
file server_priv/nodes everything works. However if we specify the  
host without suffix (as in r1blade001) pbs_server exits with the  
following error:

PBS_Server: process_host_name_part, no valid IP addresses found for  
'r1blade001' - check name service
PBS_Server: pbsd_init(setup_nodes), could not create node  
"r1blade001", error = 15010
PBS_Server: PBS_Server, pbsd_init failed

In the file src/server/node_func.c in the function  
process_host_name_part() the host ipaddrs are not counted in case we  
had more than one address on line 970. Essentially there should be one  
more section counting the ip addresses after line 1126.
This is in agreement with the above symptom: if given r1blade001m  
there will be only one IP on line 970. If given r1blade001 there will  
be two IPs on line 970.

A quick and dirty fix could be to set the second IP to NULL just  
before line 970 thus forcing the server always to assume a non  
canonical name, for which the code is ok.
My line 969 of file src/server/node_func.c reads:
h_addr_list[1] = NULL;

This works for us.

A better solution of course would be to take the ip counting bit out  
of the if clause on line 970.

Best regards,
Michael Marti

Michael Marti
Instituto Superior Técnico
Instituto de Plasmas e Fusão Nuclear
Complexo Interdisciplinar
Av. Rovisco Pais
1049-001 Lisboa

Tel:       +351 218 419 379
Fax:      +351 218 464 455
Mobile:  +351 968 434 327

More information about the torqueusers mailing list