[torqueusers] NUMA question on build from trunk.

David Beer dbeer at adaptivecomputing.com
Tue Sep 28 12:29:43 MDT 2010


Mike,

What is the contents of your nodes file now? I admit that I have never tested configuring a NUMA mom on a different port.

David

----- Original Message -----
> I added the mom_service_port=16302 and mom_manager_port=16303 to the
> nodes file directly, it seems be still having issues accessing the
> mom.
> I was able to get it respond on port 1500x , also I was not able to
> add
> these setting via qmgr. When I do a
> #pbsnodes -a
> pbsnodes: No nodes found
> 
> 09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;rpp request received on
> stream 0
> 09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;inter-server request
> received
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received
> from
> stream 0 (version 1)
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received
> from
> stream <myip4address>.116:1022: mom_port 16302 - rm_port 16303
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message HELLO (1)
> received from mom on host styx.<mydomainname>
> (<myip4address>.116:1022)
> (stream 0)
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;HELLO received from
> styx.<mydomainname>
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;Add cluster addrs
> to
> styx.<mydomainname>
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;add_cluster_addrs;adding
> node[0]
> interface[0] <myip4address>.116 to hello response
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;sending
> cluster-addrs
> to node styx.<mydomainname>
> 09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;rpp request received on
> stream 0
> 09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;inter-server request
> received
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received
> from
> stream 0 (version 1)
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received
> from
> stream <myip4address>.116:1022: mom_port 16302 - rm_port 16303
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message STATUS (4)
> received from mom on host styx.<mydomainname>
> (<myip4address>.116:1022)
> (stream 0)
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;IS_STATUS received
> from styx.<mydomainname>
> 09/28/2010 12:45:05;0040;PBS_Server;Req;is_stat_get;received status
> from
> node styx.<mydomainname>
> 09/28/2010
> 12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::is_stat_get, Could
> not find NUMA index 0 for node styx.<mydomainname>
> 09/28/2010 12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::No child
> processes (10) in is_request, IS_STATUS error 10 on node
> styx.<mydomainname>
> 09/28/2010
> 12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::is_request,
> Protocol
> failure in commit from styx.<mydomainname>(<myip4address>.116:1022)
> 09/28/2010 12:45:05;0040;PBS_Server;Req;update_node_state;adjusting
> state for node styx.<mydomainname> - state=2, newstate=2
> 
> 
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of David Beer
> Sent: Tuesday, September 28, 2010 10:27 AM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] NUMA question on build from trunk.
> 
> 
> >
> >
> > # pbsnodes -a
> >
> > styx.<mydomainname>-0
> >
> > state = down
> >
> > np = 2
> >
> > ntype = cluster
> >
> > mom_service_port = 15002
> >
> > mom_manager_port = 15003
> >
> >
> >
> > styx.<mydomainname>-1
> >
> > state = down
> >
> > np = 2
> >
> > ntype = cluster
> >
> > mom_service_port = 15002
> >
> > mom_manager_port = 15003
> >
> >
> >
> >
> >
> > shows my momports to be on 15002 etc but my mom was started as
> >
> > pbs_mom -S 16301 -M 16302 -R 16303
> >
> > pbs_server -S styx.<mydomainname>:72559 -p 16301 -M 16302 -R 16303
> >
> >
> >
> > I set the node files as
> >
> > styx.<mydomainname> np=4 num_numa_nodes=2
> >
> >
> >
> 
> You need to report change the nodes file slightly:
> 
> styx.<mydomainname> np=4 num_numa_nodes=2 mom_service_port=16302
> mom_manager_port=16303
> 
> That way the server knows where to send the requests.
> 
> --
> David Beer | Senior Software Engineer
> Adaptive Computing
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
David Beer | Senior Software Engineer
Adaptive Computing


More information about the torqueusers mailing list