[torqueusers] Need help with numa/cpuset‏

François P-L francois.prudhomme at hotmail.fr
Mon Jun 24 08:48:41 MDT 2013


Hello,
Thanks for your post. I confirm 4.1.6.h1 (which was 4.1.6.1 during a short time... why "h"1 ?) version have no compilation error.But cpuset/hwloc is in the same state for me.

> Subject: Re: [torqueusers]  Need help with numa/cpuset‏
> From: dgottlieb at exchange.asc.edu
> Date: Mon, 24 Jun 2013 09:41:19 -0500
> CC: torqueusers at supercluster.org
> To: francois.prudhomme at hotmail.fr
> 
> FYI,
> 
> The "catch_child.c:1973: error: ‘sisters’ undeclared" bug they introduced in 4.1.6.  Support said there'd be a 4.1.6.1   release to fix this, but no release yet.  Looks like there's a commit from a week ago in the 4.1.6.h1 branch in github.
> 
> Derek Gottlieb
> HPC Systems Analyst, CSC
> Alabama Supercomputer Center
> 
> 686 Discovery Dr., Huntsville, AL 35806
> High Performance Computing | dgottlieb at asc.edu | www.asc.edu
> 
> On Jun 12, 2013, at 2:42 PM, François P-L wrote:
> 
> > (sorry if this message appear 3 times... make many mistakes... :()
> > 
> > Hello,
> > 
> > I'm asking for help to use 2 things on my torque cluster :
> > - 1) Use of mom.layout for better use of my nodes
> > - 2) Use of cpuset
> > 
> > Actually, to do this, i'm using the 4.1.6 branch and configure with theses options :
> > --prefix=/usr
> > --enable-syslog
> > --disable-gui
> > --with-sched=no
> > --enable-nvidia-gpus
> > --enable-numa-support
> > --enable-cpuset
> > --with-tcp-retry-limit=5
> > 
> > I'm using version 1.7.1 of hwloc on a Debian squeeze with 3.2.0-0.bpo.3-amd64 kernel
> > 
> > The only problem during make was :
> > catch_child.c:1973: error: ‘sisters’ undeclared (first use in this function)
> > catch_child.c:1973: error: (Each undeclared identifier is reported only once
> > catch_child.c:1973: error: for each function it appears in.)
> > 
> > Don't know why there is a condition to declare this identifier between lines 1685-1689... i deleted the #ifndef/#endif to correct this problem.
> > 
> > I'm lauching all packages on a vm with 4 cpus and a very minimal config :
> > create queue batch
> > set queue batch queue_type = Execution
> > set queue batch resources_default.ncpus = 1
> > set queue batch resources_default.nodes = 1
> > set queue batch enabled = True
> > set queue batch started = True
> > set server acl_hosts = test2
> > set server default_queue = batch
> > set server log_events = 511
> > set server mail_from = adm
> > set server scheduler_iteration = 600
> > set server node_check_rate = 150
> > set server tcp_timeout = 300
> > set server job_stat_rate = 45
> > set server poll_jobs = True
> > set server mom_job_sync = True
> > set server next_job_number = 11
> > set server moab_array_compatible = True
> > 
> > cat /var/spool/torque/server_priv/nodes
> > test2 np=4 num_node_boards=1
> > 
> > cat /var/spool/torque/mom_priv/mom.layout
> > nodes=0
> > 
> > (and maui for scheduling)
> > 
> > 
> > When all is lauched, its work... but hwloc don't do his job... if i launch a load generator as "stress" for 2 cpus (stress -t 120 -c 2) with a qsub requiring 1 cpus :
> > - /dev/cpuset/torque/"jobid"/ is well created but cpus file is empty
> > - A look with htop show a load on 2 cpus
> > - /dev/cpuset/torque/cpus is empty
> > - "lstopo --ps" don't show anything...
> > 
> > Maybe a configuration problem ? When i look at mom logs :
> > 06/12/2013 17:00:31;0002;   pbs_mom.4754;Svr;pbs_mom;Torque Mom Version = 4.1.6, loglevel = 0
> > 06/12/2013 17:00:36;0002;   pbs_mom.4754;Svr;setup_program_environment;machine topology contains 0 memory nodes, 4 cpus
> > 06/12/2013 17:00:36;0002;   pbs_mom.4754;node;read_layout_file;nodeboard  0: 1 NUMA nodes: 0
> > 06/12/2013 17:00:36;0002;   pbs_mom.4754;node;read_layout_file;Setting up this mom to function as 1 numa nodes
> > 06/12/2013 17:00:36;0002;   pbs_mom.4754;node;setup_nodeboards;nodeboard  0: 0 cpus (), 1 mems (0)
> > 06/12/2013 17:00:36;0002;   pbs_mom.4754;Svr;init_torque_cpuset;Init cpuset /dev/cpuset/torque
> > 06/12/2013 17:00:36;0002;   pbs_mom.4754;Svr;init_torque_cpuset;setting cpus =
> > 06/12/2013 17:00:36;0002;   pbs_mom.4754;Svr;init_torque_cpuset;setting mems = 0
> > 
> > Why "setting cpus" is empty ?
> > 
> > Tests with a "hwloc-bind core:0 -- stress -t 120 -c 2 &" working well.
> > 
> > 
> > Anyone have an idea ?
> > 
> > Many thanks in advance :)
> > 
> > -- 
> > This message has been scanned for viruses and 
> > dangerous content by MailScanner, and is 
> > believed to be clean. _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130624/3f15d681/attachment-0001.html 


More information about the torqueusers mailing list