[torqueusers] Need help with numa/cpuset‏

Derek Gottlieb dgottlieb at exchange.asc.edu
Mon Jun 24 08:41:19 MDT 2013


FYI,

The "catch_child.c:1973: error: ‘sisters’ undeclared" bug they introduced in 4.1.6.  Support said there'd be a 4.1.6.1   release to fix this, but no release yet.  Looks like there's a commit from a week ago in the 4.1.6.h1 branch in github.

Derek Gottlieb
HPC Systems Analyst, CSC
Alabama Supercomputer Center

686 Discovery Dr., Huntsville, AL 35806
High Performance Computing | dgottlieb at asc.edu | www.asc.edu

On Jun 12, 2013, at 2:42 PM, François P-L wrote:

> (sorry if this message appear 3 times... make many mistakes... :()
> 
> Hello,
> 
> I'm asking for help to use 2 things on my torque cluster :
> - 1) Use of mom.layout for better use of my nodes
> - 2) Use of cpuset
> 
> Actually, to do this, i'm using the 4.1.6 branch and configure with theses options :
> --prefix=/usr
> --enable-syslog
> --disable-gui
> --with-sched=no
> --enable-nvidia-gpus
> --enable-numa-support
> --enable-cpuset
> --with-tcp-retry-limit=5
> 
> I'm using version 1.7.1 of hwloc on a Debian squeeze with 3.2.0-0.bpo.3-amd64 kernel
> 
> The only problem during make was :
> catch_child.c:1973: error: ‘sisters’ undeclared (first use in this function)
> catch_child.c:1973: error: (Each undeclared identifier is reported only once
> catch_child.c:1973: error: for each function it appears in.)
> 
> Don't know why there is a condition to declare this identifier between lines 1685-1689... i deleted the #ifndef/#endif to correct this problem.
> 
> I'm lauching all packages on a vm with 4 cpus and a very minimal config :
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_default.ncpus = 1
> set queue batch resources_default.nodes = 1
> set queue batch enabled = True
> set queue batch started = True
> set server acl_hosts = test2
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 300
> set server job_stat_rate = 45
> set server poll_jobs = True
> set server mom_job_sync = True
> set server next_job_number = 11
> set server moab_array_compatible = True
> 
> cat /var/spool/torque/server_priv/nodes
> test2 np=4 num_node_boards=1
> 
> cat /var/spool/torque/mom_priv/mom.layout
> nodes=0
> 
> (and maui for scheduling)
> 
> 
> When all is lauched, its work... but hwloc don't do his job... if i launch a load generator as "stress" for 2 cpus (stress -t 120 -c 2) with a qsub requiring 1 cpus :
> - /dev/cpuset/torque/"jobid"/ is well created but cpus file is empty
> - A look with htop show a load on 2 cpus
> - /dev/cpuset/torque/cpus is empty
> - "lstopo --ps" don't show anything...
> 
> Maybe a configuration problem ? When i look at mom logs :
> 06/12/2013 17:00:31;0002;   pbs_mom.4754;Svr;pbs_mom;Torque Mom Version = 4.1.6, loglevel = 0
> 06/12/2013 17:00:36;0002;   pbs_mom.4754;Svr;setup_program_environment;machine topology contains 0 memory nodes, 4 cpus
> 06/12/2013 17:00:36;0002;   pbs_mom.4754;node;read_layout_file;nodeboard  0: 1 NUMA nodes: 0
> 06/12/2013 17:00:36;0002;   pbs_mom.4754;node;read_layout_file;Setting up this mom to function as 1 numa nodes
> 06/12/2013 17:00:36;0002;   pbs_mom.4754;node;setup_nodeboards;nodeboard  0: 0 cpus (), 1 mems (0)
> 06/12/2013 17:00:36;0002;   pbs_mom.4754;Svr;init_torque_cpuset;Init cpuset /dev/cpuset/torque
> 06/12/2013 17:00:36;0002;   pbs_mom.4754;Svr;init_torque_cpuset;setting cpus =
> 06/12/2013 17:00:36;0002;   pbs_mom.4754;Svr;init_torque_cpuset;setting mems = 0
> 
> Why "setting cpus" is empty ?
> 
> Tests with a "hwloc-bind core:0 -- stress -t 120 -c 2 &" working well.
> 
> 
> Anyone have an idea ?
> 
> Many thanks in advance :)
> 
> -- 
> This message has been scanned for viruses and 
> dangerous content by MailScanner, and is 
> believed to be clean. _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list