[torqueusers] NUMA question on build from trunk.

Mike Coyne Mike.Coyne at PACCAR.com
Tue Sep 28 12:04:05 MDT 2010


I added the mom_service_port=16302 and mom_manager_port=16303 to the
nodes file directly,  it seems be still having issues accessing the mom.
I was able to get it respond on port 1500x , also I was not able to add
these setting via qmgr. When I do a 
#pbsnodes -a
pbsnodes: No nodes found 

09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;rpp request received on
stream 0
09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;inter-server request
received
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received from
stream 0 (version 1)
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received from
stream <myip4address>.116:1022: mom_port 16302  - rm_port 16303
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message HELLO (1)
received from mom on host styx.<mydomainname> (<myip4address>.116:1022)
(stream 0)
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;HELLO received from
styx.<mydomainname>
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;Add cluster addrs to
styx.<mydomainname>
09/28/2010 12:45:05;0004;PBS_Server;Svr;add_cluster_addrs;adding node[0]
interface[0] <myip4address>.116 to hello response
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;sending cluster-addrs
to node styx.<mydomainname>
09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;rpp request received on
stream 0
09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;inter-server request
received
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received from
stream 0 (version 1)
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received from
stream <myip4address>.116:1022: mom_port 16302  - rm_port 16303
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message STATUS (4)
received from mom on host styx.<mydomainname> (<myip4address>.116:1022)
(stream 0)
09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;IS_STATUS received
from styx.<mydomainname>
09/28/2010 12:45:05;0040;PBS_Server;Req;is_stat_get;received status from
node styx.<mydomainname>
09/28/2010
12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::is_stat_get, Could
not find NUMA index 0 for node styx.<mydomainname>
09/28/2010 12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::No child
processes (10) in is_request, IS_STATUS error 10 on node
styx.<mydomainname>
09/28/2010
12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::is_request, Protocol
failure in commit from styx.<mydomainname>(<myip4address>.116:1022)
09/28/2010 12:45:05;0040;PBS_Server;Req;update_node_state;adjusting
state for node styx.<mydomainname> - state=2, newstate=2


-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of David Beer
Sent: Tuesday, September 28, 2010 10:27 AM
To: Torque Users Mailing List
Subject: Re: [torqueusers] NUMA question on build from trunk.


> 
> 
> # pbsnodes -a
> 
> styx.<mydomainname>-0
> 
> state = down
> 
> np = 2
> 
> ntype = cluster
> 
> mom_service_port = 15002
> 
> mom_manager_port = 15003
> 
> 
> 
> styx.<mydomainname>-1
> 
> state = down
> 
> np = 2
> 
> ntype = cluster
> 
> mom_service_port = 15002
> 
> mom_manager_port = 15003
> 
> 
> 
> 
> 
> shows my momports to be on 15002 etc but my mom was started as
> 
> pbs_mom -S 16301 -M 16302 -R 16303
> 
> pbs_server -S styx.<mydomainname>:72559 -p 16301 -M 16302 -R 16303
> 
> 
> 
> I set the node files as
> 
> styx.<mydomainname> np=4 num_numa_nodes=2
> 
> 
> 

You need to report change the nodes file slightly:

styx.<mydomainname> np=4 num_numa_nodes=2 mom_service_port=16302
mom_manager_port=16303

That way the server knows where to send the requests.

-- 
David Beer | Senior Software Engineer
Adaptive Computing
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list