[torqueusers] NUMA question on build from trunk.

Mike Coyne Mike.Coyne at PACCAR.com
Tue Sep 28 12:36:07 MDT 2010


Just the one line. cat nodes
styx.pbdenton.paccar.com np=4 num_numa_nodes=2 mom_service_port=16302
mom_manager_port=16303

for reference  my build is as of 4111, I built it up in a rpm with the
following config options.

%define configure_args '''--prefix=%install_prefix' --with-debug
--enable-numa-support --disable-qsub-keep-override
--enable-shell-use-argv  '--with-rcp=scp'
'--with-server-home=/var/spool/TORQUE_KAPPA' '--disable-gcc-warnings'
'--with-tcl=/opt/torque_alpha/lib/tcltk8.3/lib'
'--with-tk=/opt/torque_alpha/lib/tcltk8.3/lib'
'--with-tclx=/opt/torque_alpha/lib/tcltk8.3/lib'
'--with-tkx=/opt/torque_alpha/lib/tcltk8.3/lib' --disable-unixsockets
'--enable-cpuset' 'CPPFLAGS=-D_LARGEFILE64_SOURCE'

-----Original Message-----
From: David Beer [mailto:dbeer at adaptivecomputing.com] 
Sent: Tuesday, September 28, 2010 1:30 PM
To: Mike Coyne
Cc: Torque Users Mailing List
Subject: Re: [torqueusers] NUMA question on build from trunk.

Mike,

What is the contents of your nodes file now? I admit that I have never
tested configuring a NUMA mom on a different port.

David

----- Original Message -----
> I added the mom_service_port=16302 and mom_manager_port=16303 to the
> nodes file directly, it seems be still having issues accessing the
> mom.
> I was able to get it respond on port 1500x , also I was not able to
> add
> these setting via qmgr. When I do a
> #pbsnodes -a
> pbsnodes: No nodes found
> 
> 09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;rpp request received on
> stream 0
> 09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;inter-server request
> received
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received
> from
> stream 0 (version 1)
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received
> from
> stream <myip4address>.116:1022: mom_port 16302 - rm_port 16303
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message HELLO (1)
> received from mom on host styx.<mydomainname>
> (<myip4address>.116:1022)
> (stream 0)
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;HELLO received from
> styx.<mydomainname>
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;Add cluster addrs
> to
> styx.<mydomainname>
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;add_cluster_addrs;adding
> node[0]
> interface[0] <myip4address>.116 to hello response
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;sending
> cluster-addrs
> to node styx.<mydomainname>
> 09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;rpp request received on
> stream 0
> 09/28/2010 12:45:05;0040;PBS_Server;Req;do_rpp;inter-server request
> received
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received
> from
> stream 0 (version 1)
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message received
> from
> stream <myip4address>.116:1022: mom_port 16302 - rm_port 16303
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;message STATUS (4)
> received from mom on host styx.<mydomainname>
> (<myip4address>.116:1022)
> (stream 0)
> 09/28/2010 12:45:05;0004;PBS_Server;Svr;is_request;IS_STATUS received
> from styx.<mydomainname>
> 09/28/2010 12:45:05;0040;PBS_Server;Req;is_stat_get;received status
> from
> node styx.<mydomainname>
> 09/28/2010
> 12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::is_stat_get, Could
> not find NUMA index 0 for node styx.<mydomainname>
> 09/28/2010 12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::No child
> processes (10) in is_request, IS_STATUS error 10 on node
> styx.<mydomainname>
> 09/28/2010
> 12:45:05;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::is_request,
> Protocol
> failure in commit from styx.<mydomainname>(<myip4address>.116:1022)
> 09/28/2010 12:45:05;0040;PBS_Server;Req;update_node_state;adjusting
> state for node styx.<mydomainname> - state=2, newstate=2
> 
> 
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of David Beer
> Sent: Tuesday, September 28, 2010 10:27 AM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] NUMA question on build from trunk.
> 
> 
> >
> >
> > # pbsnodes -a
> >
> > styx.<mydomainname>-0
> >
> > state = down
> >
> > np = 2
> >
> > ntype = cluster
> >
> > mom_service_port = 15002
> >
> > mom_manager_port = 15003
> >
> >
> >
> > styx.<mydomainname>-1
> >
> > state = down
> >
> > np = 2
> >
> > ntype = cluster
> >
> > mom_service_port = 15002
> >
> > mom_manager_port = 15003
> >
> >
> >
> >
> >
> > shows my momports to be on 15002 etc but my mom was started as
> >
> > pbs_mom -S 16301 -M 16302 -R 16303
> >
> > pbs_server -S styx.<mydomainname>:72559 -p 16301 -M 16302 -R 16303
> >
> >
> >
> > I set the node files as
> >
> > styx.<mydomainname> np=4 num_numa_nodes=2
> >
> >
> >
> 
> You need to report change the nodes file slightly:
> 
> styx.<mydomainname> np=4 num_numa_nodes=2 mom_service_port=16302
> mom_manager_port=16303
> 
> That way the server knows where to send the requests.
> 
> --
> David Beer | Senior Software Engineer
> Adaptive Computing
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
David Beer | Senior Software Engineer
Adaptive Computing


More information about the torqueusers mailing list